Functionalanalytic tools and nonlinear equations Johann Baumeister Goethe University, Frankfurt, Germany Rio de Janeiro / October 2017
Outline Fréchet differentiability of the (PtS) mapping. Nonlinear equations HUM for nonlinear problems Range invariance condition Iterative methods for solving nonlinear ill-posed problems Tangential cone condition October 13, 2017
Nonlinear problem Nonlinear equation F (x) = y Solvability: the theory is more specific than in the linear case Identification: in general, parameter to solution mappings are nonlinear Identification: equation is ill-posed due to the lack of stability Linearization for ill-posed problems: works but is much more delicate Computational schemes: difficult for noisy data Equation: F (x) = y Exact data: F (x ) = y. Noisy data: Given y ε with y ε y Y ε. Find x ε with F (x ε ) y ε.
Fréchet-derivative Definition Let X, Y be Banach spaces and let F : X dom(f ) Y be a mapping with the domain of definition dom(f ). Let x 0 X be an interior point of dom(f ). F is called Fréchet-differentiable in x 0 iff there exists a linear continuous operator T : X Y such that F (x) F (x 0 ) T (x x 0 ) Y = o( x x 0 X ) as x converges to x 0. T (which is uniquely determined) is called the Fréchet-derivative in x 0 and we write F (x 0 ) for T.
Fréchet derivative-1 How to compute the Fréchet-derivative of a parameter to solution mapping? Parameter to solution mapping (PtS) F : Q ad q u V where u := F (q) solves the variational equation a 0 (u, v) + a 1 (q; u, v) = f, v for all v V. Later on, we are interested in the Fréchet-derivative of F : Q ad q ι F (q) H where ι is the imbedding of V into H. But this then obvious.
Fréchet-derivative-2 Assumptions (last lecture) (1) Given a Gelfand triple V H V of Hilbert spaces which is used to describe the state. (2) Given a Hilbert space P which is used to describe the parameters. (3) Q ad is a subset of P which describes the admissible parameters. (4) Given for each q Q ad a bilinear form a 0 (, ) + a 1 (q;, ) : V V IR. (5) For each q Q ad there exist constants γ 0 0, γ 1 0, γ(q) > 0 such that a 0 (u, v) γ 0 u V v V, a 1 (q; u, v) γ 1 u V v V, u, v V and a 0 (u, u) + a 1 (q; u, u) γ(q) u 2 V for all u V. (6) Given f V. By the Lax-Milgram Lemma (PtS) is well defined!
Fréchet-derivative of a (PtS) mapping Let p Q ad an interior point of Q ad, i.e. B ρ (p) Q ad for some ρ > 0, and let h P be a (small) increment. We consider the variational equation for the parameter p, p + h with associated solutions u p := F (p) and u p+h := F (p + h), respectively. Hence: a 0 (u p, v) + a 1 (p; u p, v) = f, v for all v V a 0 (u p+h, v) + a 1 (p + h; u p+h, v) = f, v for all v V. Subtracting these equations we obtain a 0 (u p+h u p, v)+a 1 (p; u p+h u p, v) = a 1 (h; u p+h, v) for all v V. (1) This just an informal calculation since a 1 is not defined in parameters h which do not belong necessarily to Q ad. (2) Accepting (1) we can guess what the Fréchet-derivative of F in p should be: z := F (p)(h) should solve the variational problem a 0 (z, v) + a 1 (p; z, v) = a 1 (h; u p, v) for all v V.
Fréchet-derivative of a (PtS) mapping-1 The assumptions which are sufficient for the continuation of the proof of Fréchet-differentibility are the following ones: Assumption a 1 is defined on P V V and the following estimation holds with a constant c 1 0: a 1 (r; u, v) c 1 r V u V v V for all r P, u, v V. There exists r (0, ρ) such that γ(q) γ for all q B r (p). The assumption above concerning the estimate of the trilinear form are in the context of the a/b/c-problems depending on the dimension of the domain where the differential operator is formulated. Now we continue the proof with h B r (θ) in three steps.
Fréchet-derivative of a (PtS) mapping-2 Step 1: Boundeness of F in B r (p). Let q B r (p) and let u q := F (q). Then we conclude from the variational equation for u q γ(q) u q 2 V a 0 (u q, u q ) + a 1 (q; u q, u q ) = f, u q f V u q V γ(q) 2 u q 2 V + 1 2γ(q) f 2 V With the assumption above we conclude u q 2 V 1 γ(q) 2 f 2 V 1 γ(q) 2 f 2 V
Fréchet-derivative of a (PtS) mapping-3 Step 2: Lipschitz continuity of F in B r (x 0 ) locally in p. Let h B r (θ), let u p := F (p), u p+h := F (p + h) and w := u p+h u p. Then we have γ(p) w 2 V = a 0 (w, w) + a 1 (p; w, w) = a 1 (h; u p+h, w) c 1 h P u p+h V w V c 1 h P w V with a constant c 1 0. Now we can continue similar to Step 1 to obtain u p+h u p V c 2 h P with a constant c 2 0.
Fréchet-derivative of a (PtS) mapping-4 Step 3 Fréchet-differentiability of F in p. Let w := u p+h u p z (see above). Then we have using the estimates in Step 1 and Step 2 γ(p) w 2 V a 0 (w, w) + a 1 (p; w, w) = a 1 (h; u p+h u p, w) c 3 h 2 P w V with a constant c 3 0. Now we can continue similar to Step 1,2 to obtain u p+h u p z V c 3 h 2 P with a constant c 3 0. Now, the mapping F : Q ad H is Fréchet differentiable in p. Later on we need the adjoint F (p). We omit the computation of this adjoint here. It can be obtained by computing a variational solution too.
General nonlinearity/assumptions F (x ) = y Standard assumptions F0) F : X dom(f ) Y, X, Y are Banach spaces Here we assume that X, Y are Hilbert spaces F is weakly sequentially continuous dom(f ) weakly sequentially closed There exists x 0 X, ρ > 0, with B ρ (x 0 ) dom(f ). F1) F is Fréchet-differentiable in B ρ (x 0 ) with derivative F There exists m 0 with F (x) X Y m, x B ρ (x 0 ). F2) x B ρ/2 (x 0 ). F3) F (x ) is injective but not closed. Consequence of F3): To solve the linearized equation F (x )h = z is an ill-posed problem.
Assumptions Linearization Quantitative reformulation of F1) F1) F is Fréchet differentiable in the ball B ρ (x 0 ) dom(f ) U X is a Banach space with u U u X for all u U There exists r (0, ρ/2) and E > 0 such that x B r (x 0 ) BE U(θ) There exists ν (1, 2] such that F (x) F (x ) F (x )(x x ) Y c 1 x x ν X for x Br X (x ) BE U (θ) with some constant c 1 := c 1 (x, r, E). The assumption x B ρ/2 (x 0 ) B U E (θ) may be considered as source-type assumption.
Linearization Theorem Suppose that the assumptions F0),F1),F2),F3) hold. Moreover assume that x x X c 2 F (x )(x x ) µ Y, x BX r (x ) B U E (θ) for some constant c 2 := c 2 (x, r, E) where µ > 0 with µν > 1. Then there exists r (0, r) and a constant c := c(x, r, E) such that the following estimate holds: Proof:See x x X c F (x) F (x ) µ Y, x BX r (x ) B U E (θ). C.D. Pagani Questions of stability for inverse problems. Rend. Sem. Mat. Fis. Milano 52, 1982 Notice: µ (0, 1) due to the assumption F3).
Stability estimate Theorem Under the assumptions of the theorem above we have sup{ x x X x, x B X r (x ) B U E (θ), F (x) F (x ) Y ε} c(x, r, E)ε µ How to find a space U such that the assumptions above hold? HUM for nonlinear problems Spaces V x = ran(f (x) ), x B X ρ (x 0 ). Gelfand triple V x X V x, x B X ρ (x 0 ). Problem: V x = V x, x B X ρ (x 0 )? Answer: Range invariance condition
Range Invariance Range Invariance Condition There exist linear bounded operators R( x, x) satisfying F ( x) = R( x, x) F (x), R( x, x) id c R, x, x Bρ X (x 0 ), for some 0 < c R < 1. Consequence for the HUM-approach V x = V x, x B X ρ (x 0 ) Sufficient conditions for the range invariance condition?
Range Invariance Theorem (Douglas et al.) Suppose that the assumptions F0),F1),F2) are satisfied. Then for x B X ρ (x 0 ) the following conditions are equivalent: a) ran(f (x) ) V x. b) F (x ) majorizes F (x), i.e. there exists a constant M > 0 such that F (x)h Y M F (x )h Y for all h X. c) There exists a continuous linear operator R : Y Y with F (x) = R F (x ). B.A. Barnes Majorization, range inclusion, and factorization for bounded linear operators. PAMS 133, 2004 R. Douglas On majorization, factorization, and range inclusion of operators on Hilbert space. PAMS 17, 1966 I. Serban and F. Turcu Compact perturbations and factorizations of closed range operators. Preprint, 2008
Range Invariance-1 Theorem Suppose that F0),F1),F2) are satisfied. In addition we assume: F is twice Fréchet-differentiable. F (x ) majorizes F (x) and F (x) majorizes F (x ) for all x B X ρ (x 0 ). Then there exists r (0, ρ) and a family (R x ) x B X r (x 0 ) of continuous linear operators with a) ran(f (x) ) = V x for all x B X r (x 0 ). b) F (x) = R x F (x 0 ) for all x B X r (x 0 ). c) R x id x x 0 for all x B X r (x 0 ).
Range Invariance-2 Proof: Theorem of implicit functions applied to G : B X ρ (x0 ) B(Y Y) (x, Q) F (x) F (x ) Q B(Y V). This mapping G is well defined due to the Theorem of Douglas. By the HUM-approach G Q (x, id) is an isomorphism from B(Y Y) onto B(Y Y). Obviously, G is continously differentiable. Moreover and G(x, id) = Θ, G x (x, id)(h)( ) = F (x ) (h, ), h X, G Q (x, id)(h) = F (x ) H, H B(Y Y). Since F (x ) is an isomorphism from Y onto V G Q (x, id) is an isomorphism from B(Y Y) onto B(Y Y). Now, by an application of the implicit function theorem we obtain r > 0 and a family (Q x ) x Br (x ) of continuous linear operators with F (x) = F (x ) Q x for all x B r (x ), Q x id x x for all x B r (x ). To get the desired result, we set R x := Q x, x B r (x ).
Range Invariance-3 More on these subjects can be found in J. Baumeister Linear ill-posed problems in Banach spaces: Banach uniqueness method and stability. Working paper, 2009 J. Baumeister Nonlinear ill-posed problems in Banach spaces: stability and regularization. Working paper, 2010
Iterative methods We consider again an equation F (x) = y where a noisy data y ε for y is available only Here is a list of iterative methods for nonlinear ill-posed problems which are well analyzed and efficient implemented. Landweber method Iterated Tikhonov method Levenberg-Marquardt method Gauss-Newton method Newton-type methods All these methods may benefit from a Kaczmarz-type implementation
Kaczmarz-type implementation Suppose we have mapping f : dom(f ) R and an iterative method x k+1 := I (x k, f ), k IN 0. We assume that we can decompose the mapping f as follows f = (f 0,..., f N 1 ) : dom(f 0 ) dom(f N 1 ) R := R 0 R N 1 Then we may reformulate the iteration above in a cyclic way: x k+1 = I (x k, f [k] ), k IN 0, where [k] = k mod N. A cycle is a sequence x m,... x m+n of iterates where m is a multiple of N. This idea goes back to a method of Kaczmarz which has strong applications in computer tomography. S. Kaczmarz Angenäherte Auflösung von Systemen linearer Gleichungen. Bull. Acad. Polon. A35, 1937
Landweber method The Landweber method is an iterative method to solve The necessary condition suggests the fixed point iteration x argmin ( F (x) y ε 2 Y) F (x ) (F (x ) y ε ) = θ x k+1 := x k ω k F (x k ) (F (x k ) y ε ), k IN 0, where (ω k ) k IN0 is a sequence of stepsize parameters which should guarantee convergence and efficiency. Clearly, an analysis should show that the iteration operator x x ωf (x) (F (x) y ε ) shows the property of contraction, at least in the case of exact data.
Landweber method/references There is a huge number of papers concerning this subject. L. Landweber An iteration formula for Fredholm integral euqations of the first kind. Amer. J. Math. 73, 1951 M. Hanke and A. Neubauer and O. Scherzer A convergence analysis of the Landweber iteration for nonlinear ill-posed problems. Numerische Mathematik 72, 1995
An inverse problem and the Landweber iteration Model (q w) = 1 in Ω IR 2 w(ξ) = 0 in Ω The boundary of Ω is assumed to be smooth. The corresponding parameter to solution mapping is defined as F : dom(f ) := H 2 +(Ω) q w L 2 (Ω). where F (q) is the solution of the boundary value problem above. H 2 +(Ω) := {q H 2 (Ω) : essinf q > 0}. H 2 +(Ω) is an open subset of H 2. The parameter to solution map is well defined due to the Lax-Milgram lemma.
Inverse problem and the Landweber iteration-1 The associated inverse problem is For w L 2 (Ω) find q H 2 +(Ω) such that F (q) w. Clearly, for the solution of this problem we should consider the case of noisy data y. In the paper D. Garmatter and B. Haasdonk and B. Harrach A reduced basis Landweber method for nonlinear inverse problems Inverse Problems 32.3, 2016 the Landweber is used to solve the inverse problem. The focus is on the efficiency of the implementation. The Landweber method requires the solution of two forward problems. The numerical solution is organized in a way which uses the idea of a reduced basis. This idea is related to the idea of model reduction sketched in the first lecture.
Iterated Tikhonov method The so called iterated Tikhonov method is an iterative method to solve x argmin ( F (x) y ε 2 Y + α x x + 2 X ) where x + is an approximation for the solution x looked for. The necessary condition F (x ) (F (x ) y ε ) + α(x x + ) = θ suggests the fixed point iteration x k+1 := x k α 1 F (x k ) (F (x k+1 ) y ε ), k IN 0, The iteration above is an implicite method. Why to implement such a expensive method? But notice that the minimization of the Tikhonov functional has to be evaluated several times since the best regularization parameter α is not known. This is in relation to the number of iteration.
Iterated Tikhonov method/references J. Baumeister, A. De Cezaro and A. Leitão Modified iterated Tikhonov methods for solving systems of nonlinear equations Inverse Problems and Imaging 5, 2011. C.W. Groetsch and O. Scherzer Nonstationary iterated Tikhonov-Morozov method and third order differential equations for the evaluation of unbounded operators. Math. Meth. Appl. Sci. 23, 2000 O. Scherzer Convergence rate for iterated Tikhonov regularized solutions of nonlinear ill-posed problems. Numerische Mathenatik 66, 1993
Gauss-Newton method The necessary condition F (x ) (F (x y ε ) = θ for nonlinear least squares is the starting point for the Landweber method. To solve this equation one may formulate a Newton method. Suppose that an approximation x k is known. Then we compute a correction term s k as follows: (F (x k ) F (x k ) + )(s k ) = F (x k ) (F (x k ) y ε ) is a term of second order which we neglect. Then we obtain the following iteration: x k+1 := x k + ω k s k where s k solves F (x k ) F (x k )(s k ) = F (x k ) (F (x k ) y ε ), k IN 0, and (ω k ) k IN0 is a sequence of stepsize parameters.
Gauss-Newton method/remark and References Since we neglect the term of second order we cannot expect convergence of second order even in the case of exact data in a well-posed problem. In each step of the iteration above we have to compute the correction term s k. Since this is an ill-posed problem, in general, one has to apply a stable method to get s k (Truncation of singular values, regularized least squares,... ). Therefore the Gaus-Newton method may be considered as the combination of an outer iteration (x k x k+1 = ωs k ) and an inner iteration for the computation of s k. A variant of the Gauss-Newton method is the iteratively regularized Gauss-Newton method. It results in the iteration x k+1 := x k s k, k IN 0, where (α k id + F (x k ) F (x k ))(s k ) = F (x k ) (F (x k ) y ε ) + α k (x k x + ) The regularizing sequence (α k ) k IN0 has to be chosen in an appropriate way in order to obtain convergence results.
Gauss-Newton method/references B. Kaltenbacher and A. Neubauer and O. Scherzer On convergence rates for the iteratively regularized Gauss-Newton method. IMA Journal of Numerical Analysis 17.3, 1997 B. Kaltenbacher and A. Neubauer and A.G. Ramm Convergence rates of the continous regularized Gauss-Newton method. J. Inv. Ill-posed Problems 10, 2002 Q. Jin and M. Zhong On the itaratively regularized Gauss-Newton method in Banach spaces with applications tp parameter identification problems. 2013
Levenberg-Marquardt method The Gauss-Newton method requires a solution s k of F (x k ) F (x k )(s k ) = F (x k ) (F (x k ) y ε ) The idea of the Levenberg-Marquardt method is to stabilize this step. One uses a parameter α > 0 and solves instead (F (x k ) F (x k ) + α id)(s k ) = F (x k ) (F (x k ) y ε ) Then we obtain the following iteration: x k+1 := x k ω k (F (x k ) F (x k )+α id) 1 F (x k ) (F (x k ) y ε ), k IN 0, where (ω k ) k IN0 is a sequence of stepsize parameters.
Levenberg-Marquardt method/references M. Hanke A regularizing Levenberg-Marquardt scheme, with applications to inverse groundwater filtration. Inverse Problems 13, 1997 Marquardt, D.W. An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Indust. Appl. Math. 11, 1963 J. Baumeister, B. Kaltenbacher and A. Leitão On Levenberg-Marquardt-Kaczmarz iterative methods for solving systems of nonlinear equations. Inverse Problems and Imaging 4, 2010
Newton-type methods A Newton method applied to the equation F (x) y ε = θ, actually F (x) y ε θ, uses the linearized equation. Let x k be an actual approximation of its solution. Then the next approximation x k+1 is computed by solving F (x k )(x k+1 x k ) = (F (x k ) y ε ) This leads to the following iteration Hhere (ω k ) k IN0 x k+1 := x k + ω k s k where s k solves F (x k )(s k ) = (F (x k ) y ε ), k IN 0, is a sequence of stepsize parameters. The iteration method above may be considered as a combination of an outer iteration (x k x k+1 := x k + ωs k ) and an inner iteration for the computation of s k. Newton-type methods differ in the way how the inner iteration is implemented.
Newton-type methods/references A. Rieder On convergence rates of inexact Newton regularization. Numerische Mathematik 88, 2001 A. Lechleiter Towards a general convergence theory for inexact Newton regularization. Numerische Mathematik 114, 2010 F. Margotti On Inexact Newton Methods for Inverse Problems in Banach Spaces. Thesis Universität Karlsruhe, 2015 A. Rieder and F. Margotti An inexact Newton regularization in Banach spaces based on the nonstationary iterated Tikhonov method. Journal of Inverse and Ill-posed Problems 23, 2013 B. Kaltenbacher Some Newton-type methods for the regularization of nonlinear ill-posed problems. Inverse Problems 13, 1997
Remarks concerning convergence proofs for iterative methods For all the methods above, the tangential cone condition is essential. It helps to control the quantities x k x X. In general, a main step in the convergence proof in the noisy case is the proof of monotonicity of a part of the sequence ( x k x X ) k IN. A stopping rule is not very important in the case of exact data. For noisy date it is essential since one cannot have convergence of the iterates, in general. The stopping index plays the role of a regularizing parameter. There are various stopping rules on the market: Discrepancy methods, lopping rule,.... In the case of a system of linear equations the Kaczmarz idea is used in a symmetric form too: a symmetric cycle is a cycle forward and a cycle backwards. This has nice consequences for the convergence rate. For each Kaczmarz-type implementation of the methods above a randomized version can be formulated by taking f l from the decomposition f = (f 0,..., f N 1 ) in each iteration step per random.
Tangential cone condition-a remark Tangential cone condition (TCC) F ( x) F (x) F (x)( x x) Y η F ( x) F (x) Y x, x B X ρ (x 0 ), with η (0, 1 2 ). Important for the convergence analysis! See for instance B. Kaltenbacher, A. Neubauer and O. Scherzer Iterative Regularization Methods for Nonlinear Ill-Posed problems. De Gruyter Verlag, 2007 J. Baumeister, B. Kaltenbacher and A. Leitão On Levenberg-Marquardt Kaczmarz methods for regularizing systems of nonlinear ill-posed equations 2009; submitted J. Baumeister, A. De Cezaro and A. Leitão Modified iterated Tikhonov methods for solving systems of nonlinear ill-posed equations. Inverse Problems and Imaging 5, 2011
Tangential cone condition-a remark-1 Theorem Let the assumptions F0),F1) and F2) with ν = 2 hold. Let x, x V x with x, x Vx Vx E. Then we have F (x) F (x ) F (x )(x x ) Y c (E) F (x) F (x ) Y for some constant c (E) if E is small enough. Brings us nearer to the (TCC). The problem is in the assumption x, x V x with x Vx, x Vx E since V x, V x are not related by a range condition. The smallness of E is related to the nonlinearity of F in the neighborhood of x
Tangential cone condition again Tangential cone condition (TCC) In F ( x) F (x) F (x)( x x) Y η F ( x) F (x) Y x, x B X ρ (x 0 ), with η (0, 1). A. Leitao and B.F. Svaiter On a family of gradient type projection methods for nonlinear ill-posed problems. Working paper, 2016 A. Leitao and B.F. Svaiter On projective Landweber-Kaczmarz methods for solving systems of nonlinear ill-posed equations. Inverse Problems 32, 2016 interesting sets based on the tangential cone condition are used to accelerate computational schemes for nonlinear ill-posed equations. This sets have a nice geometrical interpretation.