Regularization in Banach Space

Regularization in Banach Space Barbara Kaltenbacher, Alpen-Adria-Universität Klagenfurt joint work with Uno Hämarik, University of Tartu Bernd Hofmann, Technical University of Chemnitz Urve Kangro, University of Tartu Christiane Pöschl, Alpen-Adria-Universität Klagenfurt Elena Resmerita, Alpen-Adria-Universität Klagenfurt Otmar Scherzer, University of Vienna Frank Schöpfer, University of Oldenburg Thomas Schuster, Saarland University Ivan Tomba, University of Bologna PICOF 2014, Hammamet, Tunisis, 8 May 2014 1

Outline motivation: parameter identification in PDEs examples motivation for using Banach spaces Regularization in Banach space variational methods iterative methods 2

Motivation: Parameter Identification in PDEs 3

Parameter identification in PDEs: Examples (I) e.g., electrical impedance tomography (EIT) (σ φ) = 0 in Ω. Identify conductivity σ from measurements of the Dirichlet-to-Neumann map Λ σ, i.e., all possible pairs (φ, σ n φ) on Ω. e.g., quantitative thermoacoustic tomography (qtat): ( µ 1 E ) + σ t E + ε 2 t 2 E = J in Ω. Identify σ from measurements of the deposited energy σ E 2 in Ω. 4

Parameter identification in PDEs: Examples (II) e.g. a-example (transmissivity in groundwater modelling) (a u) = 0 in Ω. Identify a from measurements of u in Ω. 5

Parameter identification in PDEs: Examples (II) e.g. a-example (transmissivity in groundwater modelling) (a u) = 0 in Ω. Identify a from measurements of u in Ω. e.g. c-example (potential in stat. Schrödinger equation) u + c u = 0 in Ω. Identify c from measurements of u in Ω. 5

Abstract Formulation Identify parameter x in PDE A(x, u) = f from measurements of the state u C(u) = y, where x X, u V, y Y, X, V, Y... Banach spaces A : X V W... differential operator C : V Y... observation operator (a) reduced approach: operator equation for x F (x) = y, F = C S with S : X V, x u parameter-to-state map 6

Abstract Formulation Identify parameter x in PDE A(x, u) = f from measurements of the state u C(u) = y, where x Y, u V, y Y, X, V, Y... Banach spaces A : X V W... differential operator C : V Y... observation operator (a) reduced approach: operator equation for x F (x) = y, F = C S with S : X V, x u parameter-to-state map (b) all-at once approach: measurements and PDE as system for (x, u) C(u) = y in Y A(x, u) = f in W F(x, u) = y 7

Nonlinear ill-posed operator equation F (x) = y F : D(F )( X ) Y... nonlinear operator; F not continuously invertible; X, Y... Banach spaces; y δ y... noisy data, y δ y δ... noise level. regularization necessary 8

Motivation for working in Banach space X = L P with P = 1 sparse solutions Y = L R with R = 1 impulsive noise Y = L R with R = uniform noise 9

Motivation for working in Banach space X = L P with P = 1 sparse solutions Y = L R with R = 1 impulsive noise Y = L R with R = uniform noise X = L P with P = ellipticity and boundedness in the context of parameter id. in PDEs (e.g. (a u) = 0 ); avoid artificial increase of ill-posedness, that would result from a Hilbert space choice X = H d/2+ɛ Y = L R with R = realistic measurement noise model; avoid artificial increase of ill-posedness, that would result from a Hilbert space choice Y = L 2 9

Some Banach space tools (I) Smoothness X... smooth norm Gâteaux differentiable on X \ {0}; X... uniformly smooth norm Fréchet differentiable on unit sphere; Convexity X... strictly convex boundary of unit ball contains no line segment; X... uniformly convex modulus of convexity δ X (ɛ) > 0 ɛ (0, 2]; δ X (ɛ) = inf{1 1 2 (x + y) : x = y = 1, x y ɛ} L P (Ω), P (1, ) is uniformly convex (Hanner s ineq.) and uniformly smooth 10

Some Banach space tools (II) Dual space: X = L(X, R)... bounded linear functionals on X x : x x, x X uniformly smooth X uniformly convex X reflexive: X smooth X strictly convex 11

Some Banach space tools (III) Duality mapping: J p : X 2 X, J p (x) = {x X : x, x = x x = x p } J p set valued; j p... single valued selection of J p ; J p = 1 p p (Asplund) X smooth J p single valued X reflexive, smooth, strictly convex J 1 p = J p p 1 L P (Ω): J p (x) = x p P L P x P 1 sign(x) ( ) W 1,P (Ω): J p (x) = x p P ( x P 2 x) + x P 1 sign(x) W 1,P P (1, ), x = 0 on Ω n 12

Some Banach space tools (IV) Bregman distance: D p (x, x) = 1 p x p 1 p x p inf{ ξ, x x : ξ J p ( x)} smooth and uniformly convex X : convergence in D p convergence in Hilbert space case: D 2 (x, x) = 1 2 x x 2 Bregman distance wrt general convex functional R : X [, + ]: D ξ (x, x) = R(x) R( x) ξ, x x with ξ R( x). 13

Regularization in Banach Space: Variational Methods 14

Tikhonov Regularization in Banach space F (x) = y x α argmin x D(F ) F (x) y δ r + α x x 0 p, p, r (1, ), and x 0... initial guess; more generally x α argmin x D(F ) S(F (x), y δ ) + αr(x) with convex functionals S, R. 15

Assumptions X, Y Banach spaces additionally to norm topologies: weaker topologies τ X, τ Y S symmetric and satisfies generalized triangle inequality S(y 1, y 3 ) 1/r C S (S(y 1, y 2 ) + S(y 2, y 3 )) For all (y n ) n N Y, y, ỹ Y : S(y, y n ) n n 0 y n y wrt τy S(y, y n ) n 0 S(ỹ, y n ) n S(ỹ, y) (y, ỹ) S(y, ỹ) is lower semicontinuous wrt. τ 2 Y F : D(F ) X Y is continuous wrt τ X, τ Y. R : X [0, + ] is proper, convex, τ X lower semicont. D := D(F ) D(R) is closed wrt τ X. α > 0, M > 0, the sublevel set M α (M) = {x D : T α (x) M} is τ X compact. 16

Well-posedness Theorem (Well-definedness; Hofmann&BK&Pöschl&Scherzer 07) For any α > 0, y δ Y there exists a minimizer of T α = S(F ( ), y δ ) + αr( ). Theorem (Stability; HKPS07) For any α > 0, minimizers x α (y δ ) of T α are subsequentially τ X stable wrt the data y δ Y, i.e. S(y δ, y n ) n 0 x α (y n ) n if x α (y δ ) is unique. Moreover, R(x α (y n )) n R(x α (y δ )). x α (y δ ) wrt τ X 17

Theorem (Convergence; HKPS 07) Assume that a solution x of F (x) = y exists. Then there exists an R-minimizing solution x. Let the regularization parameter α be chosen in dependence of the noise level δ S(y, y δ ) 1/r, r > 1, according to α 0 and δr α 0 as δ 0 Then minimizers x δ α = x α (y δ ) of T α converge τ X -subsequentially to x, i.e. if x is unique. S(yn, δ y) n 0 x δ α n x wrt τ X 18

Theorem (Convergence Rates; HKPS07) Assume existence of an R-minimizing solution x of F (x) = y, satisfying the source condition R(x ) range(f (x ) ), i.e., ξ R(x ) w Y : ξ = F (x ) w and F satisfies the Lipschitz type condition F (x )(x x ) c 2 S(F (x), F (x )) 1/r c 1 D ξ (x, x ) x D(F ) for some c 1, c 2 > 0, ξ R(x ), r > 1 with c 1 w < 1. Let the regularization parameter α be chosen in dependence of the noise level δ S(y, y δ ) 1/r according to Then minimizers x δ α of T α satisfy α δ r 1 S(F (x δ α), y) 1/r = O(δ), D ξ (x δ α, x ) = O(δ) 19

Idea of proof: s δ α := S(F (x δ α), y δ ), d δ α = D ξ (x δ α, x ) 20

Idea of proof: s δ α := S(F (x δ α), y δ ), d δ α = D ξ (x δ α, x ) minimality of x α (y δ ) δ r + αr(x ) = T α (x ) T α (x δ α) = s δ α + αr(x δ α) 20

Idea of proof: s δ α := S(F (x δ α), y δ ), d δ α = D ξ (x δ α, x ) minimality of x α (y δ ) δ r + αr(x ) = T α (x ) T α (xα) δ = sα δ + αr(xα) δ definition of Bregman distance and source condition δ r sα δ + α(r(xα) δ R(x )) =sα δ + α(dα δ + ξ, x x α (y δ ) ) =sα δ + α(dα δ + F (x ) w, x xα ) δ use condition on F, triangle and Young s inequality δ r sα δ + α(dα δ + w, F (x )(x xα) ) δ sα δ + αdα δ α w (c ) 1 dα δ + c 2 S(F (xα), δ F (x )) 1/r sα δ + (1 c 1 w )αdα δ 1 2 sδ α C 1 α r C 2 αδ 20

Variational methods in Banach space: Further References case Y = X : [Plato 92, 94, 95, Bakushinskii&Kokurin 04] general X, Y : [Burger&Osher 04, Resmerita&Scherzer 06, Pöschl 08, Flemming 11, Hofmann&BK&Pöschl&Scherzer 07, Grasmair&Haltmeier&Scherzer 08, 10, Neubauer& Hein&Hofmann&Kindermann&Tautenhahn 10, Grasmair 10, 11,... ] 22

Regularization in Banach Space: Iterative Methods 23

Assumptions on pre-image and image space X, Y X smooth, uniformly convex X reflexive (Milman-Pettis) and strictly convex J p single valued, norm-to-weak-continuous, bijective Y arbitrary Banach space some conditions on F 24

Iteratively Regularized Gauss-Newton Method (IRGNM) in Banach space F (x) = y x k+1 argmin x D(F ) F (x k )(x x k ) + F (x k ) y δ r +αk x x 0 p, p, r (1, ), and x 0... initial guess; convex minimization problem: efficient solution see, e.g., [Bonesky, Kazimierski, Maass, Schöpfer, Schuster 07] for comparison: IRGNM in Hilbert space, r = p = 2: x k+1 = x k (F (x k ) F (x k ) + α k I ) 1 F (x k ) ( ) F (x k ) y δ + α k (x k x 0 ) 25

Stopping rule discrepancy principle k (δ) = min{k N : F (x k ) y δ Cdp δ} C dp > 1, y δ y δ... noise level Trade off between stability and approximation: Stop as early as possible (stability) such that the resudual is lower than the noise level (approximation) 26

Choice of α k discrepancy type principle: θ F (x k ) y δ F (x k )(x k+1 (α) x k ) + F (x k ) y δ F θ (xk ) y δ 0 < θ < θ < 1 Trade off between stability and approximation: Choose α k as large as possible (stability) such that the predicted residual is smaller than the old one (approximation) see also: inexact Newton (for inverse problems: [Hanke 97, Rieder 99, 01]) 27

Convergence of the IRGN Theorem (BK&Schöpfer&Schuster 09) For all k k (δ) 1 the iterates x k+1 argmin F (x k )(x x k ) + F (x k ) y δ r + αk x x 0 p α k s.t. F (x k )(x k+1 x k ) + F (x k ) y δ F θ (xk ) y δ are well-defined and x k (δ) x solution to F (x) = y as δ 0 if x unique, (and along subsequences otherwise). 28

Convergence rates for the IRGN Theorem (BK&Hofmann 10) Under the source condition J p (x x 0 ) range(f (x ) ), i.e., ξ J p (x x 0 ), w Y : ξ = F (x ) w x k converges at optimal rates D p (x k x 0, x x 0 ) = O(δ), for comparison: Hilbert space case: x k x = O( δ), i.e., D p(x k x 0, x x 0) = O(δ) 29

Remarks rates result can be extended to D x 0 p (x k, x ) = O(δ 4ν 2ν+1 ) with ν (0, 1 2 ) under regularity assumptions (variational source conditions); β > 0 x B : J p (x x 0 ), x x βd x 0 p (x, x) 1 2ν 2 F (x )(x x ) 2ν. logarithmic rates under logarithmic source conditions rates can be shown alternatively with a priori choice of α k and k instead of the discrepancy principle; needs a priori information on smoothness index ν of x, though 30

Newton-Landweber iterations Combine outer Newton iteration with inner reg. Landweber iteration (in place of Tikhonov) For k = 0, 1, 2... k do (Newton) u k,0 = 0 z k,0 = x k For n = 0, 1, 2..., n k do (Landweber) u k,n+1 = u k,n α k,n (J X p (x k x 0 ) + u k,n ) ω k,n F (x k ) jr Y (F (x k )(z k,n x k ) y δ ( + F (x k )) ) z k,n+1 = x 0 + Jp X 1 u k,n+1 + Jp X (x k x 0 ) x k+1 = z k,nk, 31

Newton-Landweber iterations Combine Hilbert space case: outer Newton iteration with inner reg. Landweber iteration (in place of Tikhonov) For k = 0, 1, 2... k do (Newton) u k,0 = 0 z k,0 = x k For n = 0, 1, 2..., n k do (Landweber) u k,n+1 = u k,n α k,n (u k,n + x k x 0 ) }{{} additional regularization ω k,n F (x k ) (F (x k )u k,n y δ + F (x k )) }{{} Newton step residual x k+1 = x k + u k,nk, 32

IRGNM-Landweber Algorithm Algorithm (Newton it. reg. Landweber method) Choose C dp, τ, C α sufficiently large, x 0 sufficiently close to x, α 00 1, ϑ > 0 sufficiently small, ω > 0, (a n ) n N0 s. t. n=0 a n < For k = 0, 1, 2... until r k C dp δ do u k,0 = 0 z k,0 = x k α k,0 = α k 1,nk 1 if k > 0 { n = nk 1 = a k r r k if r k > C dp δ For n = 0, 1, 2... until α k,nk C α(r k + δ) r 1+θ if r k C dp δ ω k,n = ϑ min{t r(s 1) k,n t s k,n, tr(p 1) k,n t p k,n, ω} u k,n+1 = u k,n α k,n Jp X (z k,n x 0 ) ω k,n F (x ( k ) jr Y (F (x k )(z k,n x k ) + F (x k ) y δ ) z k,n+1 = x 0 + Jp X 1 Jp X (x k x 0 ) + u n,k+1 } α k,n+1 = max{ˇα k,n+1, ˆα k,n+1 } with ˇα k,n+1, ˆα k,n+1 x k+1 = z k,nk. 34

t n,k = F (x k )(z n,k x k ) + F (x k ) y δ t n,k = F (x k ) jr Y (F (x k )(z n,k x k ) + F (x k ) y δ ) r n = F (x k ) y δ θ = 4ν r(1 + 2ν) 4ν ˇα n,k = τ (t n,k + ηr n + (1 + η)δ) r 1+θ ( ) 1/θ ˆα n,k+1 = α n,k 1 (1 q)α n,k ν... smoothness index according to (approximate) source condition a priori parameter choice a posteriori choice of n k might spoil strong convergence 35

Convergence of the IRGNM-Landweber Theorem (BK&Tomba 12) Assume: X smooth and s-convex with r s p, s 1 + 1 θ Then the iterates according to the above algorithm are well-defined and converge strongly x k (δ) x solution to F (x) = y as δ 0 if x unique, (and along subsequences otherwise). Optimal rates under variational source conditions. 36

Iterative methods in Banach space: Further References Newton-Landweber with α n,k 0 [Q.Jin 12] Landweber (without outer Newton loop): [Schöpfer&Louis&Schuster 06, Hein&Kazimierski 10, BK&Schöpfer&Schuster 09, [Hein&Kazimierski 10], [BK 12] iterated Tikhonov regularization: [Q.Jin&Stals 12]... 37

Numerical tests with IRGNM-Landweber c-example: u + c u = 0 in Ω. Identify c from measurements of u in Ω. Ω = (0, 1) R, Ω = (0, 1) 2 R 2, sparse c, Gaussian noise smooth c, impulsive noise 38

1-d, sparse c, Gaussian noise (a) X = L 2, Y = L 2 (b) X = L 1.1, Y = L 2 Figure: δ = 1.e 3; (a): 4141 its (26303 its with α k,n 0), rel.err.:1.e-1 (b): 3110 its (5701 its with α k,n 0), rel.err.:0.5e-1 39

1-d, smooth c, impulsive noise (a) noisy data (b) X = L 2, Y = L 2 (c) X = L 2, Y = L 1.1 Figure: impulsive noise (c): 278 its (3285 its with α k,n 0), rel.err.:0.5e-1 40

2-d, sparse c, Gaussian noise (a) Exact solution (b) δ = 1.e 3 (c) δ = 1.e 2 Figure: X = L1.1, Y = L2 41

Summary and Outlook motivation for solving inverse problems in Banach spaces: more natural norms, possible reduction of ill-posedness, sparsity variational methods: generalization of Tikhonov regularization iterative methods: Newton, Newton-Landweber,... Regularization by discretization in Banach space [Hämarik&BK&Kangro&Resmerita 14] 42

Thank you for your attention! 43