Levenberg-Marquardt method in Banach spaces with general convex regularization terms

Levenberg-Marquardt method in Banach spaces with general convex regularization terms Qinian Jin Hongqi Yang Abstract We propose a Levenberg-Marquardt method with general uniformly convex regularization terms to solve nonlinear inverse problems in Banach spaces, which is an extension of the scheme proposed by Hanke in [] in Hilbert space setting. The method is so designed that it can be used to detect the features of the sought solutions such as sparsity or piecewise constancy. It can also be used to deal with the situation that the data is contaminated by noise containing outliers. By using tools from convex analysis in Banach spaces, we establish the convergence of the method. Numerical simulations are reported to test the performance of the method. Mathematics Subject Classification (000) 65J5 65J0 47H7 Introduction Inverse problems can arise from many applications in natural sciences. Most of inverse problems are usually ill-posed in the sense that their solutions do not depend continuously on the data. Thus, the stable reconstruction of solutions of inverse problems requires regularization techniques ([, 8, 5]). For solving nonlinear inverse problems in Hilbert spaces, Hanke introduced in [] his regularizing Levenberg-Marquardt scheme which is a stable iterative procedure with the Lagrangian multipliers in the Levenberg-Marquardt method being updated by an adaptive strategy. To describe this method more precisely, consider nonlinear inverse problems that can be formulated as the form F (x) = y, (.) where F : X Y is a nonlinear Fréchet differentiable operator between two Hilbert spaces X and Y whose Fréchet derivative at x is denoted by F (x). Assuming that y δ is the only available noisy data, the regularizing Levenberg-Marquardt scheme in [] constructs the next iterate x n+ from a current iterate x n by first regularizing the linearized equation F (x n )(x x n ) = y δ F (x n ) Qinian Jin Mathematical Sciences Institute, Australian National University, Canberra, ACT 60, Australia E-mail: Qinian.Jin@anu.edu.au Hongqi Yang School of Mathematics and Computational Science, Sun Yat-sen University, Guangzhou 5075, PR China E-mail: mcsyhq@mail.sysu.edu.cn

at x n via the Tikhonov regularization x n (α, y δ ) := arg min x X { yδ F (x n ) F (x n )(x x n ) + α x x n } (.) using a quadratic penalty term and then define x n+ := x n (α n, y δ ) with α n > 0 being a parameter satisfying y δ F (x n ) F (x n )(x n (α n, y δ ) x n ) = µ y δ F (x n ), (.3) where 0 < µ < is a preassigned number. It has been shown in [] that this defines a regularization method as long as the iteration is terminated by the discrepancy principle. The regularizing Levenberg-Marquardt scheme has had far reaching impact on the development of iterative regularization methods for solving nonlinear inverse problems and has stimulated considerable subsequent work, see [3,7,8,7,30] and the references therein. In order to deal with the situations where the sought solutions are sparse or piecewise constant and where the data are contaminated by general noise, one has to use the sparse promoting functionals or total variational like functionals as regularization terms and use general fidelity terms to fit data. This leads to the consideration on regularization methods in Banach spaces with general regularization terms which has emerged as a highly active research field in recent years. The monograph [3] collects some of such research works including the variational regularization of Tikhonov type and some iterative regularization methods in Banach spaces. One may further refer to [4, 8, 0 4, 6,8] for more recent works. The purpose of the present paper is to extend the regularizing Levenberg-Marquardt scheme of Hanke to solve nonlinear inverse problems in Banach spaces using general convex regularization terms. Thus, we will consider nonlinear inverse problems modelled by (.) with F being a nonlinear operator between two Banach spaces X and Y whose dual spaces are denoted as X and Y respectively. Let Θ : X (, ] be a proper, lower semi-continuous, uniformly convex function which is chosen according to the a prior information on the feature of the sought solution. Using a current iterate (x n, ξ n ) X X with ξ n Θ(x n ), the subdifferential of Θ at x n, we define x n (α, y δ ) := arg min x X { r yδ F (x n ) F (x n )(x x n ) r + αd ξn Θ(x, x n ) }, (.4) where < r < is a given number and D ξn Θ(x, x n ) denotes the Bregman distance induced by Θ at x n in the direction ξ n. We then define the next iterate x n+ by x n+ = x n (α n, y δ ), where α n > 0 is chosen such that µ 0 y δ F (x n ) y δ F (x n ) F (x n )(x n (α n, y δ ) x n ) µ y δ F (x n ) with preassigned numbers 0 < µ 0 µ <. This choice of α n is a relaxation of (.3) and has more flexibility when the root of the equation (.3) is difficult to determine exactly. We need to update the subgradient ξ n to ξ n+ Θ(x n+ ) which, according to the minimization property of x n+, can be taken as ξ n+ = ξ n + α n F (x n ) J Y r ( y δ F (x n ) F (x n )(x n+ x n ) ), where F (x n ) : Y X denotes the adjoint of F (x n ) and Jr Y : Y Y denotes the duality mapping on Y with gauge function t t r. We repeat the above procedure until a discrepancy principle is satisfied. The precise description of the above extension will be given in Algorithm. The main result of this paper shows that the above extension of the Levenberg-Marquardt scheme is well-defined and exhibits the regularization property when used to solve ill-posed problems.

3 The introduction of a general convex function Θ into the algorithm presents many challenging issues on its convergence analysis. Unlike (.) whose minimizer can be determined explicitly, the minimizer of (.4) does not have a closed form. The possible non-smoothness of Θ and the non-hilbertian structure of X and Y prevent us from using the classical techniques. Instead we have to utilize tools from convex optimization and non-smooth analysis, including the subdifferential calculus and the Bregman distance. The convergence analysis becomes even more subtle when considering the regularization property. The main obstacle comes from the stability issue; an iterative sequence constructed using noisy data can split into many possible noise-free iterative sequences as the noise level tends zero, due to the non-unique determination of α n. We will conquer this difficulty by borrowing an idea from [0], which is based on the diagonal sequence argument, to show that all these noise-free sequences actually converge uniformly in certain sense. On the other hand, unlike the variational regularization of Tikhonov ([3]) and the non-stationary iterated Tikhonov regularization ([]) whose numerical implementation requires to solve several non-convex minimization problems, our method involves only convex minimization problems and therefore has the advantage of being implemented efficiently by convex optimization techniques. Under a prior choice of {α n }, a Levenberg-Marquardt method was considered in [] to solve (.) with X = L (Ω) and Y a Hilbert space using the convex penalty function Θ(x) = a x L + Ω Dx, where Dx denotes the total variation of x. The analysis Ω in [], however, is somewhat preliminary since it provides only weak convergence along subsequences. In contrast, our method chooses {α n } adaptively, and the whole iterative sequence converges to a solution of (.) strongly. In recent years the iteratively regularized Gauss-Newton method has been extended to solve nonlinear inverse problems in Banach spaces ([,4,6]) in which it defines {x n } by { } x n+ = arg min x X r yδ F (x n ) F (x n )(x x n ) r + α n D ξ0 Θ(x, x 0 ) It looks similar to (.4) but in fact they are essentially different. The iteratively regularized Gauss-Newton method always defines x n in a neighborhood of the initial guess x 0, while the Levenberg-Marquardt method defines x n+ in a region around x n for each n 0. From the optimization point of view, Levenberg-Marquardt method is more favorable in nature. The rest of this paper is organized as follows. In Section we collect some basic results from convex analysis in Banach spaces and prove a continuous dependence result of minimizers of uniformly convex functionals on various parameters. We then present in Section 3 the Levenberg-Marquardt method in Banach spaces with general convex regularization terms and show that the method is well-defined. The detailed convergence analysis is given in Section 4. Finally in Section 5 we report various numerical results to indicate the performance of the method. Preliminaries Let X and Y be two Banach spaces whose norms are denoted by. We use X and Y to denote their dual spaces respectively. Given x X and x X we write x, x = x (x) for the duality pair. We use and to denote the strong convergence and the weak convergence respectively. By L (X, Y) we denote for the space of all continuous linear operators from X to Y. For any A L (X, Y), we use A : Y X to denote its adjoint, i.e. A y, x = y, Ax for any x X and y Y. We use N (A) = {x X : Ax = 0} to denote the null space of A and define N (A) := {ξ X : ξ, x = 0 for all x N (A)}.

4 When X is reflexive, there holds N (A) = R(A ), where R(A ) denotes the range space of A and R(A ) denotes the closure of R(A ) in X. For a convex function Θ : X (, ], we use D(Θ) := {x X : Θ(x) < + } to denote its effective domain. We call Θ proper if D(Θ). Given x X we define Θ(x) := {ξ X : Θ( x) Θ(x) ξ, x x 0 for all x X }. Any element ξ Θ(x) is called a subgradient of Θ at x. The multi-valued mapping Θ : X X is called the subdifferential of Θ. It could happen that Θ(x) = for some x D(Θ). Let D( Θ) := {x D(Θ) : Θ(x) }. For x D( Θ) and ξ Θ(x) we define D ξ Θ( x, x) := Θ( x) Θ(x) ξ, x x, x X which is called the Bregman distance induced by Θ at x in the direction ξ. Clearly D ξ Θ( x, x) 0. By straightforward calculation one can see that D ξ Θ(x, x ) D ξ Θ(x, x ) = D ξ Θ(x, x ) + ξ ξ, x x (.) for all x, x D( Θ), ξ Θ(x ), ξ Θ(x ) and x X. Bregman distance can be used to obtain information under the Banach space norm when Θ has stronger convexity. A proper convex function Θ : X (, ] is called uniformly convex if there is a strictly increasing function ϕ : [0, ) [0, ) with ϕ(0) = 0 such that Θ(λ x + ( λ)x) + λ( λ)ϕ( x x ) λθ( x) + ( λ)θ(x) (.) for all x, x X and λ [0, ]. It is easily seen that if Θ is uniformly convex in the sense of (.) then D ξ Θ( x, x) ϕ( x x ) (.3) for all x X, x D( Θ) and ξ Θ(x). Moreover, it follows from [3, Proposition 3.5.8] that any proper, weakly lower semi-continuous, uniform convex function Θ : X (, ] is coercive in the sense that Θ(x) lim inf > 0. (.4) x x On a Banach space X, we consider for < r < the convex function x x r /r. Its subdifferential at x is given by J X r (x) := { ξ X : ξ = x r and ξ, x = x r} which gives the duality mapping J X r : X X with gauge function t t r. We call X uniformly convex if its modulus of convexity δ X (t) := inf{ x + x : x = x =, x x t} satisfies δ X (t) > 0 for all 0 < t. We call X uniformly smooth if its modulus of smoothness ρ X (s) := sup{ x + x + x x : x =, x s} ρ satisfies lim X (s) s 0 s = 0. It is well known ([6]) that any uniformly convex or uniformly smooth Banach space is reflexive. On a uniformly smooth Banach space X, every duality mapping Jr X with < r < is single valued and uniformly continuous on bounded sets. Furthermore, on a uniformly convex Banach space, any sequence {x n } satisfying x n x and x n x must satisfy x n x as n. This property can be generalized for uniformly convex functions which we state in the following result.

5 Lemma. ([]) Let Θ : X (, ] be a proper, weakly lower semi-continuous, and uniformly convex function. Then Θ admits the Kadec property, i.e. for any sequence {x n } X satisfying x n x X and Θ(x n ) Θ(x) < there holds x n x as n. We conclude this section by providing a continuous dependence result of minimizers for uniformly convex cost functionals on various parameters. This result is crucial in the forthcoming sections. Lemma. Let X and Y be Banach spaces with X being reflexive. Let {A(x) : X Y} x D L (X, Y) be such that x A(x) is continuous on D X. Let Θ : X (, ] be a proper, weakly lower semi-continuous, uniformly convex function. Assume that the sequences {α (l) } (0, ), {b (l) } Y, {x (l) } D and {ξ (l) } X with ξ (l) Θ(x (l) ) satisfy α (l) ᾱ, b (l) b, x (l) x, ξ (l) ξ as l (.5) for some ᾱ > 0, b Y, x D and ξ X with ξ Θ( x). For r < let { } z (l) := arg min z X r b(l) A(x (l) )z r + α (l) D ξ (l)θ(z, x (l) ). Then z (l) z and Θ(z (l) ) Θ( z) as l, where z := arg min z X { r b A( x)z r + ᾱd ξθ(z, x) Proof It is easy to see that z (l) and z are uniquely defined since the corresponding cost functionals are weakly lower semi-continuous and uniformly convex and hence coercive, see (.4). Because ξ (l) Θ(x (l) ) and ξ Θ( x), we may use the condition (.5) to obtain Therefore lim inf l lim sup l ( Θ(x(l) ) lim inf l Θ(x (l) ) lim sup l }. ) Θ( x) + ξ, x (l) x = Θ( x), ( ) Θ( x) ξ (l), x x (l) = Θ( x). lim l Θ(x(l) ) = Θ( x). (.6) To show z (l) z we will adapt the argument in [9,6]. By the definition of z (l) we have r b(l) A(x (l) )z (l) r + α (l) D ξ (l)θ(z (l), x (l) ) r b(l) A(x (l) )x (l) r. By the given conditions, the right hand side is bounded and thus {D ξ (l)θ(z (l), x (l) )} is bounded. Consequently, {z (l) } is bounded in X by the uniformly convexity of Θ. Since X is reflexive, {z (l) } has a subsequence, denoted by the same notation, such that z (l) z as l for some z X. Since x A(x) is continuous and A( x) L (X, Y), we have A(x (l) )z (l) A( x)z as l. By using (.6) and the weak lower semi-continuity of Θ and the Banach space norm, it follows that b A( x)z lim inf l b(l) A(x (l) )z (l), (.7) D ξθ(z, x) lim inf l D ξ (l)θ(z(l), x (l) ). (.8)

6 Consequently r b A( x)z r + ᾱd ξθ(z, x) lim inf r l b(l) A(x (l) )z (l) r + lim inf l α(l) D ξ (l)θ(z (l), x (l) ) ( ) lim inf l r b(l) A(x (l) )z (l) r + α (l) D ξ (l)θ(z (l), x (l) ) ( ) lim sup l r b(l) A(x (l) )z (l) r + α (l) D ξ (l)θ(z (l), x (l) ) ( ) lim sup l r b(l) A(x (l) ) z r + α (l) D ξ (l)θ( z, x (l) ) = r b A( x) z r + ᾱd ξθ( z, x), where, for the last inequality we used the minimality of z (l) and for the last equality we used (.5) and (.6). By using the minimality and uniqueness of z, we can conclude z = z and ( ) lim l r b(l) A(x (l) )z (l) r + α (l) D ξ (l)θ(z (l), x (l) ) = r b A( x) z r + ᾱd ξθ( z, x). (.9) Next we show that lim D ξ l (l)θ(z(l), x (l) ) = D ξθ( z, x). (.0) According to (.8) and z = z, it suffices to show that γ 0 := lim sup D ξ (l)θ(z (l), x (l) ) D ξθ( z, x) =: γ. l Suppose this is not true, i.e. γ 0 > γ. By taking a subsequence if necessary, we may assume that Then from (.9) it follows that γ 0 = lim l D ξ (l)θ(z (l), x (l) ). lim l r b(l) A(x (l) )z (l) r = r b A( x) z r + ᾱ(γ γ 0 ) < r b A( x) z r which is a contradiction to (.7). We thus obtain (.0). In view of (.0), (.5) and z (l) z as l we have ( ) lim Θ(z (l) ) Θ(x (l) ) = Θ( z) Θ( x). l This together with (.6) implies that lim l Θ(z (l) ) = Θ( z). Since z (l) z, we may use Lemma. to conclude that z (l) z as l.

7 3 The method We consider the equation (.) arising from nonlinear inverse problems, where F : D(F ) X Y is a nonlinear operator between two Banach spaces X and Y with domain D(F ). We will assume that (.) has a solution. In general (.) may have many solutions. In order to find the desired one, some selection criterion should be enforced. According to a prior information on the sought solution, we choose a proper, weakly lower semi-continuous, uniformly convex function Θ : X (, ]. By taking x 0 D(Θ) D(F ) and ξ 0 Θ(x 0 ) as the initial guess, we define x to be the solution of (.) with the property D ξ0 Θ(x, x 0 ) = min {D ξ 0 Θ(x, x 0 ) : F (x) = y}. (3.) x D(Θ) D(F ) We are interested in developing algorithms to find the solution x of (.). We will work under the following standard conditions. Assumption (a) X is a reflexive Banach space and Y is a uniformly smooth Banach space; (b) F is weakly closed on D(F ), i.e. for any sequence {x n } D(F ) satisfying x n x X and F (x n ) v Y there hold x D(F ) and F (x) = v; (c) There is ρ > 0 such that B ρ (x 0 ) D(F ) and (.) has a solution in B ρ (x 0 ) D(Θ), where B ρ (x 0 ) := {x X : x x 0 ρ}; (d) There exists {L(x) : X Y} x Bρ(x 0) L (X, Y) such that x L(x) is continuous on B ρ (x 0 ) and there is 0 η < such that for all x, x B ρ (x 0 ). F ( x) F (x) L(x)( x x) η F ( x) F (x) In Assumption, the uniform smoothness of Y in (a) is used to guarantee that the duality mapping Jr Y is single-valued and continuous for each < r <. We do not require F to be Fréchet differentiable; in case F is Fréchet differentiable, we may take L(x) = F (x), where F (x) denotes the Fréchet derivative of F at x. The condition in (d) is the so called tangential cone condition which has been widely used in the analysis of regularization methods for nonlinear inverse problems and has been verified for several important applications ([ 3, 0, 4, 6, 30]). How to replace the tangential cone condition by a weaker condition is a challenging issue. Under certain smoothness conditions on the solution, a class of Newton methods in Hilbert spaces has been shown in [9] to be order optimal under merely the Lipschitz condition on the Fréchet derivative of F when the methods are terminated by a discrepancy principle. How to extend such result to the Banach space setting remains an open problem. In view of (d) in Assumption, it is easily seen that F ( x) F (x) η L(x)( x x), x, x B ρ(x 0 ) which shows that x F (x) is continuous on B ρ (x 0 ). When X is a reflexive Banach space, by using the uniform convexity of Θ and weak closedness of F, it is standard to show that x exists. Moreover, [0, Lemma 3.] gives the following local uniqueness result. Lemma 3. Let Assumption hold. If x B ρ (x 0 ) D(Θ), then x is the unique solution of (.) in B ρ (x 0 ) D(Θ) satisfying (3.).

8 In practical applications, instead of y we only have noisy data y δ satisfying y δ y δ (3.) with a small noise level δ > 0. We will use y δ to construct an approximate solution to (.). To formulate our Levenberg-Marquardt algorithm, assuming x δ n has been constructed we consider the linearized equation L(x δ n)(x x δ n) = y δ F (x δ n) and apply to it the Tikhonov regularization whose regularizing term is the Bregman distance induced by Θ at x δ n. The next iterate x δ n+ is then constructed by chosen the regularization parameter adaptively. This leads to the following Levenberg-Marquardt algorithm. Algorithm (Levenberg-Marquardt method with noisy data). Take x 0 X and ξ 0 X such that ξ 0 Θ(x 0 ). Pick 0 < µ 0 µ < and τ >.. Let x δ 0 := x 0 and ξ δ 0 := ξ 0. Assume that x δ n and ξ δ n are well-defined, we then define x δ n+ and ξ δ n+ as follows: (a) For each α > 0 we define x n (α, y δ ) and ξ n (α, y δ ) as { } x n (α, y δ ) = arg min x X r yδ F (x δ n) L(x δ n)(x x δ n) r + αd ξ δ n Θ(x, x δ n), ξ n (α, y δ ) = ξ δ n + α L(xδ n) J Y r ( y δ F (x δ n) L(x δ n)(x n (α, y δ ) x δ n) ) ; (b) Take α n (y δ ) > 0 to be a number α such that µ 0 y δ F (x δ n) y δ F (x δ n) L(x δ n)(x n (α, y δ ) x δ n) µ y δ F (x δ n) ; (c) Define x δ n+ := x n (α n (y δ ), y δ ) and ξ δ n+ := ξ n (α n (y δ ), y δ ). 3. Let n δ be the first integer such that y δ F (x δ n δ ) τδ and use x δ n δ as an approximate solution. When X and Y are Hilbert spaces, r = and Θ(x) = x /, Algorithm reduces to the regularizing Levenberg-Marquardt scheme in [] and each minimizer x n (α, y δ ) can be written explicitly. In the setting of Algorithm in Banach spaces with general convex regularization terms, x n (α, y δ ) does not have an explicit formula. This increases the difficulty in convergence analysis. By making use of tools from convex analysis, in this section we will show that Algorithm is well-defined, and in Section 4 we will show that x δ n δ indeed converges to a solution of (.) as δ 0. In Algorithm, we need to pick ξ 0 X and x 0 X such that ξ 0 Θ(x 0 ). This can be achieved as follows: pick ξ 0 X and define x 0 = arg min x X {Θ(x) ξ 0, x }, then ξ 0 Θ(x 0 ) holds automatically; in applications, we usually have Θ 0 and Θ(0) = 0, then we can simply take x 0 = 0 and ξ 0 = 0. From the definition of x n (α, y δ ) in Algorithm we can see that 0 L(x δ n) J Y r ( y δ F (x δ n) L(x δ n)(x n (α, y δ ) x δ n) ) + α ( Θ(x n (α, y δ )) ξn δ ). The definition of ξ n (α, y δ ) is exactly motivated by this fact so that ξ n (α, y δ ) Θ(x n (α, y δ )) for all α > 0.

9 Moreover, by the minimality of x n (α, y δ ), we always have y δ F (x δ n) L(x δ n)(x n (α, y δ ) x δ n) y δ F (x δ n), α > 0. (3.3) In order to prove that Algorithm is well-defined, we need to show that the number α n (y δ ) used to define x δ n+ from x δ n exists, each x δ n is in D(F ), and the iteration terminates after n δ < steps. We achieve these via a series of results. Lemma 3. Let X be reflexive and let Θ : X (, ] be proper, weakly lower semicontinuous and uniformly convex. Then, for each α > 0, x n (α, y δ ) is uniquely determined. Moreover, the mapping α x n (α, y δ ) is continuous over (0, ), and the function α y δ F (x δ n) L(x δ n)(x n (α, y δ ) x δ n) (3.4) is continuous and monotonically increasing over (0, ). Proof All assertions, except the monotonicity of (3.4), follow from Lemma.. The monotonicity of (3.4) can be proved by a standard argument, see [3, Lemma 9..] or [7, Lemma 6.] for instance. Lemma 3.3 Let Θ : X (, ] be a proper, weakly lower semi-continuous function that is uniformly convex in the sense of (.). Let Assumption hold with 0 η <. Let η < µ 0 < and τ > ( + η)/(µ 0 η). Assume that x δ n and ξ δ n are well-defined for some 0 n < n δ with Then for any α > 0 such that D ξ δ n Θ(x, x δ n) D ξ0 Θ(x, x 0 ) ϕ(ρ). (3.5) y δ F (x δ n) L(x δ n)(x n (α, y δ ) x δ n) µ 0 y δ F (x δ n) (3.6) there hold x n (α, y δ ) B ρ (x 0 ) and D ξn(α,y δ )Θ(ˆx, x n (α, y δ )) D ξ δ n Θ(ˆx, x δ n) c 0µ r 0 α yδ F (x δ n) r (3.7) for any solution ˆx of (.) in B ρ (x 0 ) D(Θ), where c 0 := ( + η + τη)/(τµ 0 ). Proof For simplicity of exposition, we write x n (α) := x n (α, y δ ), ξ n (α) := ξ n (α, y δ ) and L n := L(x δ n). By using the identity (.) and the nonnegativity of the Bregman distance, we obtain D ξn(α)θ(ˆx, x n (α)) D ξ δ n Θ(ˆx, x δ n) ξ n (α) ξ δ n, x n (α) ˆx. By the definition of ξ n (α) we then have We can write D ξn(α)θ(ˆx, x n (α)) D ξ δ n Θ(ˆx, x δ n) α J Y r (y δ F (x δ n) L n (x n (α) x δ n)), L n (x n (α) ˆx). L n (x n (α) ˆx) = [y δ F (x δ n) L n (ˆx x δ n)] [y δ F (x δ n) L n (x n (α) x δ n)]. Then, by virtue of the property of the duality mapping J Y r, we obtain D ξn(α)θ(ˆx, x n (α)) D ξ δ n Θ(ˆx, x δ n) α yδ F (x δ n) L n (x n (α) x δ n) r + α yδ F (x δ n) L n (x n (α) x δ n) r y δ F (x δ n) L n (ˆx x δ n).

0 In view of (3.5) and (.3), we have x δ n x ρ and x x 0 ρ which implies that x δ n B ρ (x 0 ). Thus we may use (3.) and Assumption (d) to derive that y δ F (x δ n) L n (ˆx x δ n) ( + η)δ + η y δ F (x δ n). Since n < n δ we have F (x δ n) y δ τδ. Thus Therefore y δ F (x δ n) L n (ˆx x δ n) + η + τη y δ F (x δ τ n). (3.8) D ξn(α)θ(ˆx, x n (α)) D ξ δ n Θ(ˆx, x δ n) α yδ F (x δ n) L n (x n (α) x δ n) r + + η + τη y δ F (x δ τα n) L n (x n (α) x δ n) r y δ F (x δ n). In view of the inequality (3.6), we thus obtain D ξn(α)θ(ˆx, x n (α)) D ξ δ n Θ(ˆx, x δ n) c 0 α yδ F (x δ n) L n (x n (α) x δ n) r. where c 0 := ( + η + τη)/(τµ 0 ). According to the conditions on µ 0 and τ, we have c 0 > 0. Thus, in view of (3.6) again, we obtain (3.7). Finally, by using (3.7) with ˆx = x and (3.5) we have D ξn(α)θ(x, x n (α)) D ξ δ n Θ(x, x δ n) D ξ0 Θ(x, x 0 ) ϕ(ρ). This together with (.3) and x 0 x ρ implies that x n (α) B ρ (x 0 ). Proposition 3.4 Let Θ : X (, ] be proper, lower semi-continuous and uniformly convex in the sense of (.). Let Assumption hold with 0 η < /3 and let η < µ 0 µ < η and τ > ( + η)/(µ 0 η). Assume that D ξ0 Θ(x, x 0 ) ϕ(ρ). (3.9) Then x δ n are well-defined for all 0 n n δ and Algorithm terminates after n δ < iterations with n δ = O( + log δ ). Moreover, for any solution ˆx of (.) in B ρ (x 0 ) D(Θ) there hold D ξ δ n+ Θ(ˆx, x δ n+) D ξ δ n Θ(ˆx, x δ n) (3.0) and ( ) α n (y δ ) yδ F (x δ n) r C 0 D ξ δ n Θ(ˆx, x δ n) D ξ δ n+ Θ(ˆx, x δ n+) for all 0 n < n δ, where C 0 = /(c 0 µ r 0). (3.) Proof We first show that if D ξ δ n Θ(x, x δ n) D ξ0 Θ(x, x 0 ) ϕ(ρ) (3.) for some 0 n < n δ, then there exists α n (y δ ) > 0 such that where µ 0 y δ F (x δ n) f(α n (y δ )) µ y δ F (x δ n), (3.3) f(α) = y δ F (x δ n) L(x δ n)(x n (α, y δ ) x δ n) which is continuous and monotonically increasing, see Lemma 3.. To see this, we may use the minimality of x n (α, y δ ) to obtain αd ξ δ n Θ(x n (α, y δ ), x δ n) r yδ F (x δ n) r, α > 0.

This implies that lim α D ξ δ n Θ(x n (α, y δ ), x δ n) = 0 and hence lim α x n (α, y δ ) x δ n = 0 by the uniform convexity of Θ. Consequently lim f(α) = α yδ F (x δ n) > µ y δ F (x δ n). To show the existence of a finite α n (y δ ) satisfying (3.3), it suffices to show that Suppose this is not true, then Thus, we may use Lemma 3.3 to obtain lim f(α) < µ 0 y δ F (x δ n). α 0 f(α) µ 0 y δ F (x δ n), α > 0. c 0 µ r 0 α yδ F (x δ n) r D ξ δ n Θ(x, x δ n), α > 0. Taking α 0 gives y δ = F (x δ n) which is absurd since y δ F (x δ n) > τδ for n < n δ. We next show (3.) by induction on n. It is trivial for n = 0. Assume that it is true for n = m with m < n δ. Since x δ m+ = x m (α m (y δ ), y δ ) and ξ δ m+ = ξ m (α m (y δ ), y δ ), we may use Lemma 3.3 to conclude that D ξ δ m+ Θ(x, x δ m+) D ξ δ m Θ(x, x δ m) which together with the induction hypothesis shows (3.) for n = m +. Since (3.) holds true for all 0 n < n δ, we may use Lemma 3.3 to obtain (3.0) and (3.) immediately. Finally, the finiteness of n δ can be proved by a standard argument from []. For completeness we include the argument here. By Assumption (d) and the definition of x δ n+ we have for all 0 n < n δ that y δ F (x δ n+) y δ F (x δ n) L(x δ n)(x δ n+ x δ n) + F (x δ n+) F (x δ n) L(x δ n)(x δ n+ x δ n) µ y δ F (x δ n) + η F (x δ n+) F (x δ n) (µ + η) y δ F (x δ n) + η y δ F (x δ n+). This implies that y δ F (x δ n+) q y δ F (x δ n) with q = µ + η η < (3.4) and hence y δ F (x δ n) q n y δ F (x 0 ), 0 n < n δ. (3.5) If n δ =, then we must have y δ F (x δ n) > τδ for all n. But the inequality (3.5) implies y δ F (x δ n) 0 as n. Therefore n δ <. Now we take n = n δ in (3.5) and obtain q n δ y δ F (x 0 ) > τδ. This implies n δ = O( + log δ ).

4 Convergence analysis In this section we will show that Algorithm is a regularization method for solving (.), that is, we will show that if (x δ n, ξ δ n) X X, 0 n n δ, are defined by Algorithm using noisy data y δ, then x δ n δ converges to x as y δ y. To this end, it is necessary to investigate for each fixed n the behavior of x δ n as y δ y. This leads us to consider the counterpart of Algorithm with exact data which is formulated as Algorithm below. Due to the non-unique determination of α n, this algorithm actually defines many distinct iterative sequences. We will show that every sequence defined by Algorithm is convergent. This convergence result however is not enough for showing the regularization property of Algorithm. Indeed, for each fixed n, when taking y δ y, the sequence {α n (y δ )} used to define x δ n from x δ n may split into many convergent subsequences and so does {x δ n, ξ δ n} with limits defined by Algorithm. This forces us to establish a uniform convergence result for all the possible sequences defined by Algorithm. The regularization property of Algorithm is then established by using a stability result which connects Algorithm and Algorithm. 4. The method with exact data We start with the formulation of the counterpart of Algorithm when the exact data is used. Algorithm (Levenberg-Marquardt method with exact data). Let 0 < µ 0 µ < and (x 0, ξ 0 ) X X be the same as in Algorithm.. Assume that x n and ξ n are defined. If F (x n ) = y, we define x n+ = x n and ξ n+ = ξ n ; otherwise, we define x n+ and ξ n+ as follows: (a) For each α > 0 we define x n (α, y) and ξ n (α, y) as x n (α, y) = arg min x X { r y F (x n) L(x n )(x x n ) r + αd ξn Θ(x, x n ) ξ n (α, y) = ξ n + α L(x n) J Y r (y F (x n ) L(x n )(x n (α, y) x n )) ; (b) Take α n > 0 to be a number such that µ 0 y F (x n ) y F (x n ) L(x n )(x n (α n, y) x n ) µ y F (x n ) ; (c) Define x n+ := x n (α n, y) and ξ n+ := ξ n (α n, y). In the formulation of Algorithm, we take α n > 0 to be any number satisfying (b) when defining x n+, ξ n+ from x n, ξ n. There might have many possible choices of α n ; different choice of {α n } may lead to different iterative sequence. We will use Γ µ0,µ (x 0, ξ 0 ) to denote the set of all possible sequence {(x n, ξ n )} in X X constructed from (x 0, ξ 0 ) by Algorithm with α n > 0 chosen to be any number satisfying (b). By using the same argument in the proof of Proposition 3.4, we can obtain the following result which shows that each sequence in Γ µ0,µ (x 0, ξ 0 ) is well-defined and admits certain monotonicity property. Lemma 4. Let Assumption hold with 0 η < and let η < µ 0 µ <. Then any sequence {(x n, ξ n )} Γ µ0,µ (x 0, ξ 0 ) is well-defined and for any solution ˆx of (.) in B ρ (x 0 ) D(Θ) there hold for all n 0. D ξn+ Θ(ˆx, x n+ ) D ξn Θ(ˆx, x n ), (4.) y F (x n ) r ( Dξn Θ(ˆx, x n ) D ξn+ α n µ 0 η n+) ) (4.) },

3 In order to derive the convergence of every sequence {(x n, ξ n )} in Γ µ0,µ (x 0, ξ 0 ), we will use the following result which gives a general convergence criterion. Proposition 4. Consider the equation (.) for which Assumption holds. Let Θ : X (, ] be a proper, lower semi-continuous and uniformly convex function. Let {x n } B ρ (x 0 ) D(Θ) and {ξ n } X be such that (i) ξ n Θ(x n ) for all n; (ii) for any solution ˆx of (.) in B ρ (x 0 ) D(Θ) the sequence {D ξn Θ(ˆx, x n )} is monotonically decreasing; (iii) lim n F (x n ) y = 0. (iv) there is a constant C such that for all k > n and any solution ˆx of (.) in B ρ (x 0 ) D(Θ) there holds ξ k ξ n, x k ˆx C (D ξn Θ(ˆx, x n ) D ξk Θ(ˆx, x k )). (4.3) Then there exists a solution x of (.) in B ρ (x 0 ) D(Θ) such that lim x n x = 0, n lim Θ(x n) = Θ(x ) and lim D ξ n Θ(x, x n ) = 0. n n If, in addition, x B ρ (x 0 ) D(Θ) and ξ n+ ξ n R(L(x ) ) for all n, then x = x. Proof This result follows from [0, Proposition 3.6] and its proof. Now we can prove the main convergence result on Algorithm. Theorem 4.3 Let Assumption hold with 0 η < /3 and let η < µ 0 µ < η. Let Θ : X (, ] be proper, lower semi-continuous and uniformly convex in the sense of (.), and let (3.9) be satisfied. Then for any {(x n, ξ n )} Γ µ0,µ (x 0, ξ 0 ) there exists a solution x of (.) in B ρ (x 0 ) D(Θ) such that lim x n x = 0, n lim Θ(x n) = Θ(x ) and lim D ξ n Θ(x, x n ) = 0. n n If in addition N (L(x )) N (L(x)) for all x B ρ (x 0 ) D(F ), then x = x. Proof We will use Proposition 4.. By the definition of ξ n in Algorithm we always have ξ n Θ(x n ) for all n 0 which shows (i) in Proposition 4.. Lemma 4. shows (ii) in Proposition 4.. By the similar argument for deriving (3.4) we can show that y F (x n+ ) q y F (x n ), n 0 (4.4) with q = (µ + η)/( η) <. This implies (iii) in Proposition 4.. In order to show the convergence result, it remains only to show (iv) in Proposition 4.. To this end, for 0 l < m < we may use the definition of ξ n and the property of the duality mapping Jr Y to obtain m ξ m ξ l, x m x = ξ n+ ξ n, x m x n=l m = Jr Y (y F (x n ) L(x n )(x n+ x n )), L(x n )(x m x ) α n n=l m n=l α n y F (x n ) L(x n )(x n+ x n ) r L(x n )(x m x ).

4 By the triangle inequality L(x n )(x m x ) L(x n )(x n x ) + L(x n )(x m x n ) and Assumption (d), we have L(x n )(x m x ) ( + η) ( y F (x n ) + F (x n ) F (x m ) ) ( + η) ( y F (x n ) + y F (x m ) ). Since the inequality (4.4) implies that { y F (x n ) } monotonically decreasing, we have Therefore L(x n )(x m x ) 3( + η) y F (x n ), 0 n m. ξ m ξ l, x m x m 3( + η) y F (x n ) L(x n )(x n+ x n ) r y F (x n ) α n n=l m 3( + η) y F (x n ) r. α n n=l In view of (4.) in Lemma 4., we obtain with C := 3( + η)/(µ 0 η) that ξ m ξ l, x m x C (D ξl Θ(x, x l ) D ξm Θ(x, x m )) which shows (iv) in Proposition 4.. To show the last part under the condition N (L(x )) N (L(x)) for x B ρ (x 0 ) D(Θ), we observe from the definition of ξ n that ξ n+ ξ n R(L(x n ) ) N (L(x n )) N (L(x )) = R(L(x ) ). Thus, we may use the second part of Proposition 4. to conclude the proof. 4. A uniform convergence result In Theorem 4.3 we have shown the convergence of every sequence in Γ µ0,µ (x 0, ξ 0 ). In this subsection we will strengthen this result by showing the following uniform convergence result for all sequences in Γ µ0,µ (x 0, ξ 0 ) which will be crucial in establishing the regularization property of Algorithm. Proposition 4.4 Assume all the conditions in Theorem 4.3 hold. Assume also that N (L(x )) N (L(x)), x B ρ (x 0 ) D(Θ). (4.5) Then, for any ε > 0, there is an integer n(ε) such that for any sequence {(ξ n, x n )} Γ µ0,µ (ξ 0, x 0 ) there holds D ξn Θ(x, x n ) < ε for all n n(ε). The proof of Proposition 4.4 is based on some preliminary results. It is easily seen that, to complete the proof, we only need to consider the case F (x 0 ) y. The following result shows that in this case we always have F (x n ) y for all n 0 for any sequence {(x n, ξ n )} Γ µ0,µ (x 0, ξ 0 ). Thus, when defining x n+ from x n in Algorithm we always use a finite number α n > 0. Lemma 4.5 Let all the conditions in Proposition 4.4 hold. For any sequence {(x n, ξ n )} Γ µ0,µ (x 0, ξ 0 ), if F (x n ) = y for some n, then F (x 0 ) = y.

5 Proof It suffices to show that if F (x k ) = y for some k n, then F (x k ) = y. By using Assumption (d) and F (x k ) = y we have L(x )(x k x ) = 0. Thus, in view of (4.5), we have x k x N (L(x )) N (L(x k )). Consequently L(x k )(x k x ) = 0. If F (x k ) y, then by the definition of x k and Assumption (d) we have µ 0 y F (x k ) y F (x k ) L(x k )(x k x k ) = y F (x k ) L(x k )(x x k ) η y F (x k ) which is impossible since µ 0 > η. The proof is thus complete. The next result shows that, if F (x n ) y, then we can give the upper and lower bounds on the number α n used to define x n+ from x n. Lemma 4.6 Let all the conditions in Theorem 4.3 hold. If F (x n ) y, then for any α satisfying µ 0 y F (x n ) y F (x n ) L(x n )(x n (α, y) x n ) µ y F (x n ) (4.6) there holds 0 < α n α α n <, where α n := (µr 0 η r ) y F (x n ) r rϕ(ρ) and α n := y F (x n ) r rϕ(( µ ) y F (x n ) / L(x n ) ). Proof By the definition of x n (α, y) and the uniform convexity of Θ, we have αϕ( x n (α, y) x n ) αd ξn Θ(x n (α, y), x n ) r y F (x n) r. In view of the second inequality in (4.6) we can obtain L(x n ) x n (α, y) x n L(x n )(x n (α, y) x n ) ( µ ) y F (x n ). Consequently ( ) ( µ ) y F (x n ) αϕ L(x n ) r y F (x n) r (4.7) which implies that α α n. On the other hand, by the definition of x n (α, y) we have r y F (x n) L(x n )(x n (α, y) x n ) r r y F (x n) L(x n )(x x n ) r + αd ξn Θ(x, x n ). In view of the first inequality in (4.6), Assumption (d), and the inequality D ξn Θ(x, x n ) D ξ0 Θ(x, x 0 ) ϕ(ρ) from Lemma 4., it follows that µ r 0 y F (x n ) r η r y F (x n ) r + rαϕ(ρ). This implies that α α n. Now we are ready to give the proof of Proposition 4.4. We will use an idea from [0] which is based on the well-known diagonal sequence argument.

6 Proof of Proposition 4.4. We may assume that F (x 0 ) y. We will use a contradiction argument. Assume that the result is not true. Then there is an ε 0 > 0 such that for any l there exist {(x (l) n, ξ (l) )} Γ µ0,µ (x 0, ξ 0 ) and > l such that n D ξ (l) Θ(x, x (l) ) ε 0. (4.8) We will construct, for each n = 0,,, a strictly increasing subsequence {l n,k } of positive integers and ( x n, ξ n ) X X such that (i) {( x n, ξ n )} Γ µ0,µ (x 0, ξ 0 ); (ii) for each fixed n there hold x (l n,k) n k there hold D ξ (l n,k ) n x n and ξ (l n,k) n Θ( x n, x (l n,k) n ) ε 0 /4 and ξ n as k. Moreover, for all ξ (l n,k) n ξ n, x n x ε 0 /4. Assume that the above construction is available, we will derive a contradiction. According to (i), we may use Theorem 4.3 to conclude that D ξn Θ(x, x n ) 0 as n. Thus we can pick a large integer ˆn such that D ξˆn Θ(x, xˆn ) < ε 0 /. Let ˆl := lˆn,ˆn and consider the sequence {(x (ˆl) n, ξ n (ˆl) )}. According to (ii), we have ( ) ε 0 / > Θ(x, D ξˆn xˆn ) D (ˆl) Θ(x, x (ˆl) ξ ˆn ) + D (ˆl) Θ(x, x (ˆl) ˆn ξ ˆn ) ˆn = D (ˆl) Θ( xˆn, x (ˆl) ξ ˆn ˆn ) + ξˆn ξ (ˆl) ˆn ε 0 /4 ε 0 /4 + D (ˆl) Θ(x, x (ˆl) ξ ˆn ). ˆn, xˆn x + D (ˆl) Θ(x, x (ˆl) ξ ˆn ) ˆn Since {lˆn,k } is strictly increasing, we have nˆl > ˆl = lˆn,ˆn ˆn. Therefore, we may use Lemma 4. to obtain D (ˆl) Θ(x, x (ˆl) ξ nˆl nˆl ) D ξ (ˆl) ˆn Θ(x, x (ˆl) ˆn ) < ε 0 which is a contradiction to (4.8) with l = ˆl. We turn to the construction of {l n,k } and ( x n, ξ n ), for each n = 0,,, such that (i) and (ii) hold. For n = 0, we take ( x 0, ξ 0 ) = (x 0, ξ 0 ) and l 0,k = k for all k. Since ξ (k) 0 = ξ 0 and x (k) 0 = x 0, (ii) holds automatically for n = 0. Next, assume that we have constructed {l n,k } and ( x n, ξ n ) for all 0 n m. We will construct {l m+,k } and ( x m+, ξ m+ ). Since F (x 0 ) y, we have from Lemma 4.5 that F ( x m ) y and F (x (l) m ) y for all l. Let α m (l) > 0 be the number used to define (x (l) m+, ξ(l) m+ ) from (x(l) m, ξ m (l) ). From Lemma 4.6 and the induction hypothesis x (l m,k) m x m we can conclude that there are two positive numbers α m and ᾱ m independent of k such that α m α (l m,k) m ᾱ m for all k. Thus {l m,k } must have a subsequence, denoted as {l m+,k }, such that {α (l m+,k) m } converges to some number α m (0, ) as k. We define x m+ = arg min x X { } r y F ( x m) L( x m )(x x m ) r + α m D ξm Θ(x, x m ), ξ m+ = ξ m + α m L( x m ) J Y r (y F ( x m ) L( x m )( x m+ x m )). It is clear that x m+ = x m (α m, y) and ξ m+ = ξ m (α m, y). In view of the induction hypotheses x (l m+,k) m x m and ξ (l m+,k) m ξ m, the continuity of x F (x) and x

7 L(x), and α (l m+,k) m α m, we may use Lemma. and the continuity of the duality mapping Jr Y to conclude that x (l m+,k) m+ x m+, Θ(x (l m+,k) m+ ) Θ( x m+ ) and ξ (l m+,k) m+ ξ m+ (4.9) as k. According to the choice of α (l m+,k) m have and thus the definition of x (l m+,k) m+, we µ 0 y F (x (l m+,k) m Letting k gives ) y F (x (l m+,k) m µ y F (x (l m+,k) m ). ) L(x (l m+,k) m )(x (l m+,k) m+ x (l m+,k) m ) µ 0 y F ( x m ) y F ( x m ) L( x m )( x m+ x m ) µ y F ( x m ). Thus x m+ = x m (α m, y) satisfies the desired requirement. We therefore complete the construction of {l m+,k } and ( x m+, ξ m+ ). We need to show that x m+ and ξ m+ satisfy the estimates in (ii) for n = m +. We may use (4.9) to obtain lim k ξ(l m+,k) m+ ξ m+, x m+ x = 0, lim D k ξ (l m+,k ) m+ Θ( x m+, x (l m+,k) m+ ) = 0 Consequently, by taking a subsequence of {l m+,k } if necessary, which is still denoted by the same notation, we can guarantee (ii) for n = m +. 4.3 Regularization property In this section we will establish the regularization property of Algorithm which is stated in the following result. Theorem 4.7 Let Θ : X (, ] be a proper, lower semi-continuous function that is uniformly convex in the sense of (.), and let Assumption hold with 0 η < /3. Let η < µ 0 µ < η and τ > ( + η)/(µ 0 η), and let (3.9) be satisfied. Assume further that N (L(x )) N (L(x)), x B ρ (x 0 ) D(Θ). Then for x δ n δ X and ξ δ n δ X defined by Algorithm there hold lim δ 0 xδ n δ x = 0, lim δ 0 Θ(x δ n δ ) = Θ(x ) and lim δ 0 D ξ δ nδ Θ(x, x δ n δ ) = 0. In order to prove Theorem 4.7, we will need to establish certain stability result to connect Algorithm and Algorithm so that Proposition 4.4 can be used. The following stability result is sufficient for our purpose. Lemma 4.8 Let F (x 0 ) y and let all the conditions in Theorem 4.7 hold. Let {y δ l } be a sequence of noisy data satisfying y δ l y δ l with δ l 0 as l. Let x δ l n and ξ δ l n, 0 n n δl, be defined by Algorithm. Then for any finite n lim inf l n δl, by taking a subsequence of {y δ l } if necessary, there is a sequence {(x m, ξ m )} Γ µ0,µ (x 0, ξ 0 ) such that for all 0 m n. x δ l m x m, ξ δ l m ξ m and Θ(x δ l m) Θ(x m ) as l

8 Proof Since F (x 0 ) y, we must have lim inf l n δl. We will use an induction argument on n. When n = 0, nothing needs to prove since x δ l 0 = x 0 and ξ δ l 0 = ξ 0. Assume next that, for some 0 n < lim inf l n δl, the result is true for some sequence {(x m, ξ m )} Γ µ0,µ (x 0, ξ 0 ) with 0 m n. In order to show the result is also true for n +, we will obtain a sequence from Γ µ0,µ (x 0, ξ 0 ) by retaining the first n + terms in {(x m, ξ m )} and modifying the remaining terms. It suffices to redefine x n+ and ξ n+ since then we can apply Algorithm to produce the remaining terms. Since F (x 0 ) y, we have from Lemma 4.5 that F (x n ) y. Let α n (y δ l ) be the number used to define x δ l n+ and ξδ l n+. Since the induction hypothesis xδ l n x n and the fact y δ l y imply F (x δ l n ) y δ l F (x n ) y > 0 as l, we may use the similar argument in the proof of Lemma 4.6 to conclude that α n α n (y δ l ) ᾱ n for two numbers 0 < α n ᾱ n < independent of l. Therefore, by taking a subsequence of {y δ l } if necessary, we may assume that α n (y δ l ) α n as l for some number α n (0, ). We define { } x n+ = arg min x X r y F (x n) L(x n )(x x n ) r + α n D ξn Θ(x, x n ), ξ n+ = ξ n + L(x n ) Jr Y (y F (x n ) L(x n )(x n+ x n )). α n In view of the induction hypotheses and the continuity of x F (x) and x L(x), we may use Lemma. and the continuity of Jr Y to conclude that Moreover, since x δ l n+ x n+, Θ(x δ l n+ ) Θ(x n+) and ξ δ l n+ ξ n+ as l. µ 0 y δ l F (x δ l n ) y δ l F (x δ l n ) L(x δ l n )(x δ l n+ xδ l n ) µ y δ l F (x δ l n ), by taking l we can conclude that µ 0 y F (x n ) y F (x n ) L(x n )(x n+ x n ) µ y F (x n ). Therefore x n+ = x n (α n, y) and ξ n+ = ξ n (α n, y) are the desired elements to be defined. The proof is thus complete. The next result will be used to prove lim δ 0 Θ(x δ n δ ) = Θ(x ) in Theorem 4.7. Lemma 4.9 Let all the conditions in Theorem 4.7 hold and let {(x δ n, ξn)} δ 0 n nδ defined by Algorithm. Then for all 0 l n δ there holds be where C = ( + η)(3τ + )/(τc 0 µ r 0). ξ δ n δ ξ δ l, x x δ n δ C D ξ δ l Θ(x, x δ l ),

9 Proof By the definition of ξ δ n, the property of J Y r and (3.3), we can obtain ξ δ nδ ξl δ, x x δ n δ n δ ξ δ n+ ξn, δ x x δ n δ n δ n=l n δ n=l n=l α n (y δ ) yδ F (x δ n) L(x δ n)(x δ n+ x δ n) r L(x δ n)(x x δ n δ ) α n (y δ ) yδ F (x δ n) r L(x δ n)(x x δ n δ ). By Assumption (d) and (3.4) we can derive that L(x δ n)(x x δ n δ ) ( + η) ( δ + 3 y δ F (x δ n) ), 0 n < n δ. Since y δ F (x δ n) > τδ for 0 n < n δ, we therefore have Consequently L(x δ n)(x x δ n δ ) ( + η)(3τ + ) y δ F (x δ τ n), 0 n < n δ. ξ n δ δ ξl δ, x x δ n δ n ( + η)(3τ + ) δ τ α n (y δ ) yδ F (x δ n) r. n=l This together with (3.) in Proposition 3.4 implies the desired estimate. We are now ready to prove Theorem 4.7, the main result of this paper. Proof of Theorem 4.7. We may assume that F (x 0 ) y. We first claim that lim n δ =. (4.0) δ 0 Suppose that this is not true. Then there exists {y δ l } satisfying y δ l y δ l with δ l 0 such that n δl ˆn as l for some finite integer ˆn. Thus n δl = ˆn for large l. By the definition of n δl we have F (x δ l ˆn ) yδ l τδ l. (4.) In view of Lemma 4.8, by taking a subsequence of {y δ l } if necessary, we can find {(x n, ξ n )} Γ µ0,µ (x 0, ξ 0 ) such that x δ l ˆn xˆn as l. Letting l in (4.) gives F (xˆn ) = y. Consequently, by Lemma 4.5, we must have F (x 0 ) = y which is a contradiction. We next show the convergence result. We first prove that lim D ξ Θ(x, x δ n δ 0 δ δ n δ ) = 0 (4.) by a contradiction argument. Suppose that (4.) is not true. Then there exist a number ε > 0 and a sequence {y δ l } satisfying y δ l y δ l with δ l 0 as l such that D ξ δ l Θ(x, x δ l ) ε for all l, (4.3) where := n δl. According to Proposition 4.4, there is an integer n(ε) such that D ξn(ε) Θ(x, x n(ε) ) < ε, {(x n, ξ n )} Γ µ0,µ (x 0, ξ 0 ). (4.4)

0 For this n(ε), by using Lemma 4.8 and by taking a subsequence of {y δ l } if necessary, we can find {(x n, ξ n )} Γ µ0,µ (x 0, ξ 0 ) such that x δ l n x n, ξ δ l n ξ n and Θ(x δ l n ) Θ(x n ) as l (4.5) for 0 n n(ε). Since (4.0) implies that > n(ε) for large l, by using Proposition 3.4 we have D δ ξ l Θ(x, x δ l n ) D δ l ξ l Θ(x, x δ l n(ε) ) = Θ(x ) Θ(x δ l n(ε) ) ξδ l n(ε), x x δ l n(ε). n(ε) In view of (4.5) and (4.4), we therefore obtain lim sup D δ ξ l Θ(x, x δ l n ) Θ(x ) lim Θ(x δ l l l l n(ε) ) lim l ξδ l n(ε), x x δ l n(ε) = Θ(x ) Θ(x n(ε) ) ξ n(ε), x x n(ε) = D ξn(ε) Θ(x, x n(ε) ) < ε. This is a contradiction to (4.3). We thus obtain (4.). By virtue of the uniform convexity of Θ, we then have lim δ 0 x δ n δ x = 0. It remains only to show that lim δ 0 Θ(x δ n δ ) = Θ(x ). In view of (4.), it suffices to show that lim δ 0 ξδ n δ, x x δ n δ = 0. (4.6) We again use a contradiction argument by assuming that there is a number ε > 0 and a sequence {y δ l } satisfying y δ l y δ l with δ l 0 as l such that ξ δ l, x x δ l C ε for all l, (4.7) where C is the constant defined in Lemma 4.9. For this ε, we may use Proposition 4.4 and Lemma 4.8 to find an integer n(ε) such that (4.4) and (4.5) hold. In view of Lemma 4.9, we have ξ δ l, x x δ l ξ δ l ξ δ l n(ε), x x δ l + ξ δ l n(ε), x x δ l C D δ ξ l Θ(x, x δ l n(ε) ) + ξδ l n(ε), x x δ l. n(ε) By taking l, using x δ l x, (4.5) and (4.4), we can obtain lim sup ξ δ l, x x δ l C lim D δ l l ξ l Θ(x, x δ l n(ε) ) = C D ξn(ε) Θ(x, x n(ε) ) < C ε n(ε) which contradicts (4.7). We therefore obtain (4.6) and complete the proof of Theorem 4.7. 5 Numerical results We consider the identification of the parameter c in the boundary value problem { u + cu = f in Ω, u = g on Ω (5.) from the measurements of the state variable u in Ω, where Ω R d, d, is a bounded domain with Lipschitz boundary Ω, f H (Ω) and g H / ( Ω). This is a benchmark example of nonlinear inverse problems. It is well known that (5.) has a unique solution u = u(c) H (Ω) for each c in the domain D := { c L (Ω) : c ĉ L (Ω) γ 0 for some ĉ 0, a.e. }

with some γ 0 > 0. By the Sobolev embedding H (Ω) L r (Ω), it makes sense to define the parameter-to-solution map F : D L (Ω) L r (Ω) with F (c) = u(c) for any < r <. We consider the problem of identifying c L (Ω) from an L r (Ω)- measurement of u. This is amount to solving F (c) = u. It is known that F is Fréchet differentiable; the Fréchet derivative and its Banach space adjoint are given respectively by F (c)h = A(c) (hu(c)), F (c) w = u(c)a(c) w, h L (Ω), w L r (Ω), (5.) where r is the number conjugate to r, i.e. /r + /r =, and A(c) : H (Ω) H 0 (Ω) L (Ω) is defined by A(c)u = u + cu. Recall that in the space L r (Ω) with < r < the duality mapping J r : L r (Ω) L r (Ω) is given by J r (ϕ) := ϕ r sign(ϕ), ϕ L r (Ω). For this parameter identification problem, the tangential cone condition has been verified for r in [,6]; the verification for < r <, however, is not yet available. In the following we will report some numerical results for this inverse problem to indicate the performance of Algorithm with various choices of the convex function Θ and the Banach spaces X and Y. The main computational cost stems from solving the convex minimization problems involved in the algorithm which requires numerical solutions of differential equations related to calculating the Fréchet derivatives and their adjoint. We use BFGS one of the most popular quasi-newton methods for Example 5. and a restarted nonlinear CG method for Example 5. and Example 5. below, see [9]. Some fast algorithms have been discovered for solving convex optimization problems in recent years, including the fast proximal gradient method ([4]) and the primal dual hybrid gradient methods ([33, 5]). These methods are powerful to deal with problems for which fast solvers such as fft are applicable. In case fast algorithms are not applicable as in our computation, they might not have much advantage over other type methods. Example 5. Consider the one-dimensional problem on the interval Ω = (0, ) with the source term f(t) = 00e 0(t 0.5) and the boundary data u(0) = and u() =. We will identify the sought solution c (t) = 5t ( t) + sin (πt) using noisy data that contains a few data points, called outliers, which are highly inconsistent with other data points. The appearance of outliers may arise from procedural measurement errors. In Figure 5. we present the numerical results by Algorithm with τ =., µ 0 = 0.90 and µ = 0.96 using the initial guess c 0 = 0 and ξ 0 = 0. In order to carry out the computation, the differential equations involved are solved by a finite difference method by dividing Ω = (0, ) into 400 subintervals of equal length. The minimization problems involved in the algorithm are solved by performing 50 iterations of the BFGS method. Figure 5. (a) and (d) show the plots of the noisy data; the one in (a) contains only Gaussian noise, while the one in (d) contains not only Gaussian noise but also 0% outliers. Figure 5. (b) and (e) present the reconstruction results by the regularizing Levenberg-Marquardt scheme of Hanke, i.e. Algorithm with X = Y = L [0, ] and Θ(c) = c L ; it shows that the method is highly susceptible to the existence of outliers. In Figure 5. (c) and (f) we present the reconstruction results by Algorithm with X = L [0, ], Y = L r [0, ] with r =. and Θ(c) = c L. It can be seen that the method is robust enough to prevent being affected by outliers. Using the L r misfit data terms, with r close to, to exclude the outliers has been investigated for several other regularization methods, see [4, 5, 8, 3].

3 (a) (b) n δ =6 (c) n δ =8.5.5.5 0.5.5 0.5 0.5 0 0.5 0 0 0.5 0 0 0.5 3 (d) (e) n δ =9 (f) n δ =7.5.5.5 0.5.5 0.5 0.5 0 0.5 0 0 0.5 0 0 0.5 Fig. 5. Numerical results for Example 5.: (a) and (d) data with noise; (b) and (e) reconstruction results by Algorithm with X = Y = L [0, ] and Θ(c) = c ; (c) and (f) reconstruction result by Algorithm with X = L [0, ], Y = L. [0, ] and Θ(c) = c. Example 5. We next consider the two dimensional problem on Ω = [0, ] [0, ] with the source term f(x, y) = 00e 0(x 0.5) 0(y 0.5) and the boundary data g on Ω. The sought solution is a piecewise constant function as shown in Figure 5. (a). We reconstruct the sought solution using Algorithm with X = Y = L (Ω) and different choices of Θ. In order to carry out the computation, we divide Ω into 00 00 small squares of equal size. All partial differential equations involved are solved approximately by a finite difference method. When using Algorithm, we use τ =.0, µ 0 = 0.90, µ = 0.96 and take ξ 0 = c 0 = 0 as an initial guess. The minimization problem to determine c δ n for each n is solved by a restart CG method after 000 iterations. (a) 6 (b) n δ =3 (c) n δ =6 6 6 5 5 5 4 4 4 3 3 3 0 0 0 Fig. 5. Reconstruction results for Example 5.: (a) exact solution; (b) Algorithm with Θ(c) = c ; (c) Algorithm with Θ(c) = c + c T V,ε and ε = 0 4.

3 In Figure 5. we report the numerical results using measurement data that is corrupted by Gaussian noise with noise level δ = 0.00. Figure 5. (b) presents the reconstruction result using Θ(c) = c L. Due to the over-smoothing effect, the reconstruction result turns out to contain unsatisfactory artifacts. Figure 5. (c) reports the reconstruction result using Θ(c) = λ c L + c T V,ε with λ = /, where ε = 0 4 and c T V,ε = c + ε which can be considered as a smoothed approximation of the Ω total variation functional c. Clearly the result in (c) significantly improves the one Ω in (b) by efficiently removing the undesired artifacts. Example 5.3 We use the same setup in Example 5. but now the sought solution is sparse. The domain Ω is divided into 0 0 small squares of equal size in order to solve the associated partial differential equations. In Figure 5.3 we report the reconstruction results of Algorithm using measurement data contaminated by Gaussian noise with noise level δ = 0.00. We use τ =.00, µ 0 = 0.90, µ = 0.96 and take c 0 = ξ 0 = 0 as an initial guess. The minimization problems involved in the algorithm are solved again by a restart CG after 300 iterations. The true solution is plot in Figure 5.3 (a). Figure 5.3 (b) presents the numerical result of Algorithm using Θ(c) = c L. Figure 5.3 (c) reports the numerical result of Algorithm using Θ(c) = λ c L + Ω c + ε with λ = 0.0 and ε = 0 4 which can be regarded as a smoothed approximation of the L norm. A comparison on the results in (b) and (c) clearly shows that the sparsity of the sought solution is significantly reconstructed in (c). Therefore, a proper use of a convex function close to the L norm can improve the reconstruction of sparse functions dramatically. (a) 0 (b) n δ = (c) n δ =0 9 8 7 6 5 4 3 0 4 3.5 3.5.5 0.5 0 9 8 7 6 5 4 3 0 Fig. 5.3 Reconstruction results for Example 5.3: (a) exact solution; (b) Algorithm with Θ(c) = c L ; (c) Algorithm with Θ(c) = λ c L + Ω c + ε, where λ = 0.0 and ε = 0 4. References. M. Bachmayr and M. Burger, Iterative total variation schemes for nonlinear inverse problems, Inverse Problems, 5 (0), 05004 (009).. A. B. Bakushinsky and M. Yu. Kokurin, Iterative Methods for Approximate Solutions of Inverse Problems, Math. Appl. (N.Y.) 577, Springer, Dordrecht, 004. 3. M. S. Bazaraa, H. D. Sherali and C. M. Shetty, Nonlinear Programming, Theory and Algorithms, Third edition, Wiley-Interscience, 006. 4. A. Beck and M. Teboulle, A fast iterative shrinkage-thresholding algorithm for linear inverse problems, SIAM J. Imaging Sci., (009), 83 0. 5. A. Chambolle and T. Pock, A first-order primal-dual algorithm for convex problems with applications to imaging, J. Math. Imaging Vis., 40 (0), 0 45. 6. I. Cioranescu, Geometry of Banach Spaces, Duality Mappings and Nonlinear Problems, Dordrecht: Kluwer, 990. 7. F. Colonius and K. Kunisch, Stability for parameter estimation in two point boundary value problems, J. Reine Angewandte math. 370 (986), 9. 8. H. W. Engl, M. Hanke and A. Neunauer, Regularization of Inverse Problems, Kluwer, Dordrecht, 996.