= E [ξ ϕ ξ ψ ] satises 1

Size: px

Start display at page:

Download "= E [ξ ϕ ξ ψ ] satises 1"

Julia Stevenson
5 years ago
Views:

1 ITERATIVELY REGULARIZED GAUSS-NEWTON METHOD FOR NONLINEAR INVERSE PROBLEMS WITH RANDOM NOISE FRAN BAUER 1, THORSTEN HOHAGE 2, AXEL MUN 2 1 FUZZY LOGIC LABORATORIUM LINZ-HAGENBERG, AUSTRIA 2 GEORG-AUGUST UNIVERSITÄT GÖTTINGEN, GERMANY Abstract. We study the convergence of regularized Newton methods applied to nonlinear operator equations in Hilbert spaces if the data are perturbed by random noise. It is shown that the expected square error is bounded by a constant times the minimax rates of the corresponding linearized problem if the stopping index is chosen using a-priori nowledge of the smoothness of the solution. For unnown smoothness the stopping index can be chosen adaptively based on Lepsi 's balancing principle. For this stopping rule we establish an oracle inequality, which implies order optimal rates for deterministic errors, and optimal rates up to a logarithmic factor for random noise. The performance and the statistical properties of the proposed method are illustrated by Monte-Carlo simulations. AMS subject classications. 65J22, 62G99, 65J20, 65J15 ey words. iterative regularization methods, nonlinear statistical inverse problems, convergence rates, a-posteriori stopping rules, oracle inequalities 1. Introduction. In this paper we study the solution of nonlinear ill-posed operator equations F a) = u, 1.1) assuming that the exact data u are perturbed by random noise. Here F : DF ) X Y is a nonlinear operator between separable Hilbert spaces X and Y, which is Fréchet dierentiable on its domain DF ). Whereas the solution of nonlinear operator equations by iterative regularization methods with deterministic errors has been studied intensively over the last decade see Baushinsii & ourin [2] and references therein), we are not aware of any convergence and convergence rate results of iterative regularization methods for nonlinear inverse problems with random noise, although in a many practical examples iterative methods are frequently applied where the noise is random rather than deterministic. Vice versa, nonlinear regularization methods are rarely used in a statistical context, and our aim is to explore the potential use of these methods in classical elds of applications such as econometrics [22] and nancial statistics [10]. In particular, we will derive rates of convergence under rather general assumptions. We will consider the case that the error in the data consists of both deterministic and stochastic parts, but we are mainly interested in the situation where the stochastic noise is dominant. More precisely, we assume that the measured data u obs are described by a Hilbert space process u obs = F a ) + δη + σξ, 1.2a) where δ 0 describes the deterministic error level, η Y, η 1 denotes the normalized deterministic error, σ 2 0 determines the variance of the stochastic noise, and ξ is a Hilbert space process in Y. We assume that ξ ϕ := ξ, ϕ is a random variable with E [ξ ϕ ] = 0 and Varξ ϕ < for any test vector ϕ Y, and that the covariance operator Cov ξ : Y Y, characterized by Cov ξ ϕ, ψ Y = E [ξ ϕ ξ ψ ] satises 1

2 the normalization condition Cov ξ Y b) Note that in the case of white noise Cov ξ = I Y ) the noise ξ is not in the Hilbert space with probability 1, nevertheless this prominent situation is covered in our setting in the wea formulation as above. For a further discussion of the noise model 1.2) we refer to [5] where it is shown that it incorporates several discrete noise models where the data consists of a vector of n measurements of a function u at dierent points. In this case σ is proportional to n 1/2, i.e. the limit n corresponds to the limit σ 0. A particular discrete noise model will be discussed in section 5. In this paper we provide a convergence analysis for the class of generalized Gauss- Newton methods given by â +1 := a 0 + g α+1 F [â ] F [â ]) F [â ] ) u obs F â ) + F [â ]â a 0 ) 1.3a) with g α λ) := λ+α)m α m λλ+α) corresponding to m-times iterated Tihonov regularization m with m N. Standard Tihonov regularization is included as special case m = 1. For deterministic errors i.e. σ = 0 in 1.2a)) the convergence of the iteration 1.3a) has been studied in [1, 2, 3]. Typically the explicit formula 1.3a) is not used in implementations, but â +1 is computed by solving m linear least squares problems see Fig. 2.1). The advantage of iterated Tihonov regularization over ordinary Tihonov regularization is a higher qualication number see [12]). For simplicity, we assume that the regularization parameters are chosen of the form α = α 0 q 1.3b) with q 0, 1) and α 0 > 0. Let us comment on the dierences when treating measurement errors as random instead of deterministic: From a practical point of view the most important dierence is the choice of the stopping index. Whereas the discrepancy principle as most common deterministic stopping rule wors reasonably well for discrete random noise models with small data vectors, the performance of the discrepancy principle becomes arbitrarily bad as the size of the data vector increases. This is further discussed and numerically demonstrated in Ÿ5. The same holds true for the deterministic version of Lepsi 's balancing principle as studied for the iteration 1.3a) in [3]. From a theoretical point of view the rates of convergence are dierent for deterministic and random noise, and in the latter case they depend not only on the source condition, but also on the operator F and the covariance operator Cov ξ of the noise. Actually our analysis also provides an improvement of nown results for purely deterministic errors, i.e. σ = 0. This is achieved by showing an oracle inequality, which is a well-established technique in statistics cf. [8, 9]), but rarely used in numerical analysis so far. To our nowledge the only deterministic oracle inequality has been shown by Mathé & Pereverzev [19, 21] for Lepsi 's balancing principle for linear problems. Theorem 4.1 below is a generalization of this result to nonlinear problems. As shown in Remar 4.3 this provides error estimates which are better by an arbitrarily large factor than any nown error deterministic estimates for any nonlinear inversion method in the limit δ 0. An important alternative to the iteratively regularized Gauss-Newton method is nonlinear Tihonov regularization, for which convergence and convergence rate results 2

3 for random noise have been obtained by O'Sullivan [23], Bissantz, Hohage & Mun [4], Loubes & Lude«a [17], and Hohage & Pricop [14]. In this paper we show order optimal rates of convergence under less restrictive assumptions on the operator than in [14, 17, 23] and for a range of smoothness classes instead of a single one as in [4]. The paper is organized as follows: In the following section we show that the total error can be decomposed into an approximation error, a propagated data error, and a nonlinearity error, and the last error component is dominated by the sum of the rst two error components Lemma 2.2). This will be fundamental for the rest of this paper. In section 3 we prove order optimal rates of convergence if the smoothness of the solution is nown and the stopping index is chosen appropriately. Adaptation to unnown smoothness by Lepsi 's balancing principle is discussed in section 4, and an oracle inequality for nonlinear inverse problems is shown. The paper is completed by numerical simulations for a parameter identication problem in a dierential equation, which illustrate how well the theoretical rates of convergence are met and compare the performance of the balancing principle and the discrepancy principle. 2. Error decomposition. In this section we will analyze the error E = â a 2.1) of the iteration 1.3). We set â 0 := a 0, i.e. E 0 = a 0 a. Since lower bounds for the expected square error are tyically not available for nonlinear inverse problems, we compare our upper bounds on the error with lower bounds for the linearized inverse problem T a = u obs lin, u obs lin = T a + δη + σξ 2.2) with the operator T := F [a ]. It is a fundamental observation due to Baushinsii [1], which transfers directly from deterministic errors to random noise, that the total error E +1 = E app +1 + Enoi +1 + E+1 nl 2.3a) in 2.1) can be decomposed into an approximation error E app +1, a propagated data noise error E+1 noi, and a nonlinearity error Enl +1 given by E app +1 := r α +1 T T )E 0, E+1 noi := g α+1 T T )T δη + σξ), 2.3b) E+1 nl := g α+1 T T )T F a ) ) F â ) + T E + rα+1 T T ) r T α+1 T ) ) E 0. Here T := F α m. [â ] and r α λ) = 1 λg α λ) = α+λ) If F is linear, then T = T = F, and the Taylor remainder F a ) F â ) + T E vanishes. Hence E+1 nl = 0, i.e. the nonlinearity error vanishes for the linearized equation 2.2). This can also be seen as follows: If F is linear, then the iteration formula 1.3a) reduces to the nonrecursive formula â = a 0 + g α T T )T u δ lin T a 0), which is the underlying linear regularization method with initial guess a 0 and regularization parameter α applied to 2.2). The approximation error E app agrees exactly in the linear and the nonlinear case, and the data noise error E noi only diers by the operator T and T. The goal of the following analysis is to show that the nonlinearity error E nl can be bounded in terms of sharp estimates of + E noi Lemma 2.2). 3

4 Approximation error. We will assume that there exists w Y such that a 0 a = T w with w ρ, 2.4) for some ρ > 0. This is equivalent to the existence of w X with w ρ such that a 0 a = T T ) 1/2 w see [12, Prop. 2.18]). Later we will require ρ to be suciently small, which expresses the usual closeness condition on the initial guess required for the convergence of Newton's method. Note, however, that we do not only require smallness of a 0 a in the norm X, but smallness in the stronger norm T ) X. It is well nown see [12] or eq. 3.3) below) that under assumption 2.4) the approximation error of iterated Tihonov regularization is bounded by Moreover, the approximation error satises C r ρ α, N. 2.5) +1 Eapp γ app +1, N 2.6a) with γ app := q m. If α 0 is chosen suciently large, we also have E 0 γ app b) All the inequalities in 2.6) can be reduced to inequalities for real-valued functions with the help of spectral theory [12]. The second inequality in 2.6a) follows from ) m ) m α α r α t) = = 1 α + t α +1 + t q m r α +1 t), t 0, m and the rst inequality holds since r α t) = 1 α+t) t is monotonically increasing ) m ) m α in α. Finally, 2.6b) follows from sup 0 t [0,t] = 1 for α0 qα 0+t) α 0 qα 0+t) qt/1 q) where t := T T. Remar 2.1 Note that the second inequality in 2.6a) rules out regularization methods with innite qualication such as Landweber iteration as alternative to iterated Tihonov regularization. Although the regularized Newton method 1.3a) also converges for such linear regularization methods [2, 15]) and convergence is even faster for smooth solutions, the estimate 2.15) in Lemma 2.2 will be violated in general as it contains the norm of the approximation error itself instead of an estimate. This estimate is crucial to achieve the improvement discussed in Remar 4.3. The results of this paper also hold true for other spectral regularization methods satisfying 2.6) and 2.12) below, but since iterated Tihonov regularization is by far the most common choice, we have decided to restrict ourselves to this case for simplicity. Propagated data noise error. The deterministic part of E noi the well-nown operator norm bound g α T T )T can be estimated by C g α 2.7) with a constant C g depending only on m. This bound cannot be used to estimate the stochastic part of E noi, or more precisely the variance term V a, α) := g α F [a] F [a])f [a] ξ 2 4

5 with a DF ) and α > 0, since ξ = almost surely for typical noise processes ξ such as white noise. We assume that there exists a nown function ϕ noi : 0, α 0 ] 0, ) such that E [V a, α)]) 1/2 ϕ noi α) for α 0, α 0 ] and a DF ). 2.8a) Such a condition is satised if F [a] is Hilbert-Schmidt for all a DF ), and the singular values of these operators have the same rate of decay for all a DF ). Estimates of the form 2.8a) have been derived for spectral regularization methods under general assumptions in [5]. We further assume that 1 < γ noi ϕ noiα +1 ) ϕ noi α ) γ noi, N 2.8b) for some constants γ noi, γ noi <. Moreover, we assume an exponential inequality of the form P {V a, α) τe [V a, α)]} c 1 e c2τ for all a DF ), α 0, α 0 ], τ 1 2.8c) with constants c 1, c 2 > 0. Such an exponential inequality is derived for Gaussian noise processes ξ in the appendix. If ξ is white noise, and the singular values of F [a] decay at the rate σ j F [a]) j β with β > 1 2 uniformly for all a DF ), then we can choose ϕ noi of the form ϕ noi α) = C noi α c, 2.9) with c := β and some constant C noi 0, ) see [5]). For colored noise c may also have values smaller than 1 2. By virtue of the exponential inequality 2.8c) the probability is very small that the stochastic propagated data noise error V â, α ) at any Newton step is much larger than the expected value E [V â, α )]. We will distinguish between a good case that the propagated data noise is small at all Newton steps, and a bad case that the noise is large in at least one Newton step. The good case is analyzed in Lemma 2.2 below. The proof of Theorem 3.2 will require a rather eloborate distinction of what is small and what is large in order to derive the optimal rate of convergence. This distinction will be described by functions τ, σ) satisfying τ + 1, σ)ϕ noi α +1 ) 2 τ, σ)ϕ noi α ) 2, σ > 0, = 1, 2,..., a) For given τ = τ, σ) and a given maximal iteration number, the good event A τ, is that all iterates â belong to DF ) for = 1,..., and hence are well dened) and the propagated data noise error is bounded by E noi Φ noi ) for = 1,..., with Φ noi ) := τ, σ)σϕ noi α ) + δ C g α. 2.10b) Nonlinearity error. In the following we will assume that the Fréchet derivative F of F satises a Lipschitz condition with constant L > 0, i.e. F [a 1 ] F [a 2 ] L a 1 a 2 for all a 1, a 2 DF ). 2.11) 5

6 If the straight line connecting a and â is contained in DF ), the Taylor remainder in 2.3) is bounded as F a ) F â ) + T E L/2) E 2. Now it follows from 2.7) that the rst term in the denition of E+1 nl is bounded by C gl/2 α +1 ) E 2. To bound the second term in the denition of E+1 nl, we need the assumption 2.4) and the estimate r α T T ) r α T T )) T T ) 1/2 C nl T T, 2.12) which can be shown with the help of the Riesz-Dunford formula, see Baushinsii & ourin [2, section 4.1]. Using 2.12), the source condition 2.4), and 2.11), the second term can be bounded by C nl ρ E with a constant C nl > 0. In summary we obtain the following recursive estimate of the nonlinearity error: E+1 nl LC g 2 E 2 + α C nl ρ E. 2.13) +1 Lemma 2.2 Assume that the ball B 2R a 0 ) with center a 0 and radius 2R > 0 is contained in DF ) and that 1.2), 1.3), 2.8), and 2.11) hold true. Moreover, let α 0 be suciently large that 2.6b) is satised. Then there exists ρ > 0 specied in the proof such that the following holds true for all a B R a 0 ) satisfying the source condition 2.4) and all δ, σ 0: If 2.10) dening the good event is satised with = max dened by { } ) max := max N : Φ noi )α 1/2 1 R C stop and 0 < C stop min, 8LC g 4, α ) then â B R a ) and E nl γ nl + Φ noi )) for all = 1,..., max. 2.15) Here γ nl := 8LC g C stop satises γ nl 1. Proof. We prove this by induction in, starting with the induction step. If the assertion holds true for 1 = 2,..., max ), then we obtain from 2.3a), 2.6a), and 2.10) that E γ nl ) 1 + Φ noi 1) ) 1 + γ nl ) γ app + Φ noi )). Hence, it follows from 2.13) and the inequality x + y) 2 2x 2 + 2y 2 that 2.16) E nl C nl ρ1 + γ nl ) γ app + Φ noi )) + LC g α 1 + γ nl ) 2 γ 2 app 2 + Φ 2 noi) ). 2.17) If ρ γ nl /2 C nl 1 + γ nl )γ app ), the rst term on the right hand side of 2.17) is bounded by γ nl /2) + Φ noi )). Now we estimate the second term in 2.17). Using 2.5) we obtain α C r ρ γ nl 2LC g 1 + γ nl ) 2 γ 2 app 6

7 Input: u obs, δ, σ, F, a 0, R, max := 0; â 0 := a 0 while < max and â a 0 2R â 0) +1 := a 0 for j = 1,.., m â j) +1 := argmin a X â +1 := â m) +1 ; := + 1 end { } F [â ]a â ) + F â ) u obs 2 Y + α +1 a â j 1) +1 2 X if â a 0 > 2R) := 0 else choose {0, 1,..., max } by 3.4), 3.7), 4.3) rsp. Output: â Fig Algorithm: Iteratively regularized Gauss-Newton method with m-times iterated Tihonov regularization for random noise. for ρ suciently small, so LCg α 1 + γ nl ) 2 γapp E 2 app 2 γ nl /2). Moreover, it follows from the inequality γ nl 1 and the denition of max that LC g α 1 + γ nl ) 2 Φ 2 noi) 4LC g α Φ 2 noi) 4LC g C stop Φ noi ) γ nl 2 Φ noi). Therefore, the second line in 2.17) is also bounded by γ nl /2) + Φ noi )), which yields 2.15) for. This can be used to show that â DF ): If ρ is suciently small, then C r ρ α 0 R/2 + 2γ nl ). Moreover, it follows from the monotonicity of Φ noi cf. 2.10)), 2.14), and γ nl min1, LC g α 1/2 0 /2) that Φ noi ) Φ noi max ) C stop α 1/2 max C stop α 1/2 0 R 4 R 2 + 2γ nl. Together with 2.15) this shows that E 1 + γ nl ) + Φ noi )) R, i.e. â B R a ) DF ). It remains to establish the assertion for = 1. Due to assumption 2.6b), E 0 is bounded by the right hand side of 2.16) with = 1. Now the assertion for = 1 follows as above. Since we do not assume the stochastic noise ξ to be bounded, there is a positive probability that â / DF ) at each Newton step. Therefore, we stop the iteration if â a 0 2R for some, and we choose the initial guess a 0 as estimator of a in this case. The algorithm is summarized in Fig Convergence results for nown smoothness. In the following we will assume more smoothness for a 0 a than in 2.4). This is expressed in terms of source conditions of the form a 0 a = ΛT T ) w 7

8 where Λ is continuous and monotonely increasing with Λ0) = 0 see [20] for the linear case). If sup Λt) r α t) C Λ Λα), α [0, α 0 ], 3.1) t [0,t] then C Λ Λα ) w. The most important case is Λt) = t µ, and the condition a 0 a = T T ) µ w with w ρ 3.2) is called a Hölder source condition. The largest number µ > 0 for which 3.2) holds true with this choice of Λ is called the qualication µ 0 of the linear regularization method. The qualication of Tihonov regularization is µ 0 = 1, and the qualication of m-times iterated Tihonov regularization is µ 0 = m see [12]). We obtain C µ α µ ρ for 0 µ µ 0 3.3) Let us rst consider deterministic errors, i.e. σ = 0. The following result shows that the same rate of convergence can be achieved for the nonlinear inverse problem 1.2) as for the linearized problem 2.2). Theorem 3.1 deterministic errors) Let the assumptions of Lemma 2.2 hold true with σ = 0, and let := min { max, }, := argmin N + δ C ) g. 3.4) α Then there exist constants C, δ 0 > 0 such that â a C inf + δ C ) g N α for all δ 0, δ 0 ]. 3.5) Note that E app = r α T T )a 0 a ) is well dened for all N even if â is not well dened for all.) Proof. If max, then 3.5) follows with Φ noi ) = δc g / α and C = 1 + γ nl from Lemma 2.2 and the error decomposition 2.3). On the other hand, using 2.5) and 2.14) we get 1 +Φ noi ) Φ noi ) > C stop αmax E app 1 + C r ρ/c max + Φ noi max ) ) stop for > max. Therefore, 3.5) holds true with C = 1 + C r ρ/c stop if max and hence = max. In particular, for Hölder source conditions 3.2) with µ [ 1 2, µ 0], plugging 3.3) into 3.5), we obtain the rate ) â a = O ρ 1 2µ 2µ+1 δ 2µ+1, δ 0 3.6) rst shown by Baushinsii [1] for µ = 1 and Blasche, Neubauer & Scherzer [6] for µ [1/2, 1]. Explicit rates for other source conditions follow analogously. We now turn to the general noise model 1.2). 8

9 Theorem 3.2 general noise model) Consider an iterative regularization method described by 1.3) and a noise model 1.2), 2.8). Assume that B 2R a 0 ) DF ), that F satises 2.11), and that a Hölder source condition 3.2) with µ 1/2, µ 0 ] and ρ suciently small holds true. Dene := argmin N ) + δ C g + σϕ noi α ) α. 3.7) If â a 0 2R for = 1,...,, set :=, otherwise := 0. Then the expected square error of the regularized Newton iteration converges at the same rate as the expected square error of the corresponding linear regularization method, i.e. there exist constants C > 1 and δ 0, σ 0 > 0 such that [ E â a 2]) 1/2 C min + δ C ) g + σϕ noi α ) N α 3.8) for all δ 0, δ 0 ] and σ 0, σ 0 ] Proof. We dene the bounded noise events A 1 A 2... A Jσ) with Jσ) := lnσ 2 )/c 2 and c2 from 2.8c) by { A j := â B 2R a 0 ) and E noi δc g + α τ j, σ)σϕ noi α ), } = 1,...,, with τ j, σ) := j + ln κ c 2 ) and κ > 1 3.9) cf. 2.10b)). Due to 2.8b) we can choose κ > 1 suciently small such that τ j satises condition 2.10a) for all j = 1,..., J. Denition 3.9) is motivated by the fact that an unusually large propagated data noise term E noi at the rst Newton steps, where the total error is dominated by E app, has less eect than an unusually large propagated data noise error at the last Newton steps. Lemma 2.2 and Lemma 3.4 below entail that the iterates â 1,..., â remain in B R a ) if the bounds on E noi, = 1,..., in the denition 3.9) of A j with j Jσ) are satised and σ is suciently small. Together with 2.8c) this yields for the probability of the complementary event PCA j ) c 1 =1 c 1 exp c 2 j) exp c 2 τ j, σ)) = c 1 exp c 2 j) m=0 κ m = c 1 exp c 2 j) 1 κ 1. =1 κ By denition, we have = for the events A j. Applying Lemma 2.2 and the inequality x + y + z) 2 3x 2 + 3y 2 + 3z 2 and using γ nl 1 we obtain â a δ 2 C2 g + 6τ j, σ)σ 2 ϕ noi α ) 2 =: B j for the event A j. α Moreover, by the denition of the algorithm we have an error bound â a â a 0 + a 0 a 3R for the event CA Jσ). Note that Jσ) is chosen such that 9

10 e c2jσ) σ 2. Hence, E [ â a 2] Jσ) PA 1 )B 1 + PA j \ A j 1 )B j + PCA Jσ) )9R 2 j= δ 2 C2 Jσ) 1 g + 6 σϕ noi α )) 2 c 1 α 1 κ 1 + c 1e c2jσ) 9R2 1 κ 1 C + δ C g + σϕ noi α ) α ) 2 j=1 je c2j for some constant C > 0. In the second line we have used that PA 1 ) + Jσ) j=2 PA j \ A j 1 ) + PCA Jσ) ) = 1 and PA j \ A j 1 ) PCA j 1 ). Now 3.8) follows as in the proof of Theorem 3.1. The proof of Theorem 3.2 is completed by the following two lemmas: Lemma 3.3 Setting Φ noi ) := δ Cg α + σϕ noi α ), γ noi := min{γ noi, q 1/2 }, and γ noi := max{γ noi, q 1/2 }, the optimal stopping index dened in 3.7) satises the following bounds: Φ noi ) γ app 1 1 γ 1, 3.10) noi { sup N : 1 γapp 1 > inf C r ρ α l + Φ ) } noi 1)γ l 1 noi. 3.11) l N Proof. It follows from 2.8b) that 1 < γ noi Φ noi + 1) Φ noi ) To show 3.10), assume on the contrary that γ app 1) Then we obtain using 2.6a) and 3.12) 1 + Φ noi 1) γ app γ noi, N. 3.12) < 1 γ 1 ) Φ noi ). 3.13) noi + γ 1 noi Φ noi ) < + Φ noi ) in contradiction to the denition of. Therefore, 3.13) is false, and 3.10) holds true. To show 3.11), assume that C r ρ α l + Φ noi 1)γ l 1 noi for some, l 1. Then for all we have l + Φ noi l) C r ρ α l + Φ noi 1)γ l 1 noi < 1 γapp 1 + Φ noi ) 10 < 1 γ 1 app 3.14) Eapp

11 using 2.5), 2.6a), and 3.12). Therefore, it follows from the denition of that >. Taing the inmum over l in 3.14) and then the supremum over yields 3.11). Lemma 3.4 Under the assumptions of Theorem 3.2 there exists σ 0 > 0 such that max σ, j) for all σ σ 0 and j = 1,..., Jσ) := lnσ 2 )/c ) with max σ, j) dened as in Lemma 2.2 using τ j in 3.9): { max σ, j) := max N : δ C ) } g + τ j, σ)σϕ noi α ) α 1/2 C stop α Proof. It follows from 2.10a), the inequalities τ j, σ) τ Jσ), σ) lnσ 2 )/c 2 and 3.10) that for all j Jσ) and δ C ) g + τ j, σ)σϕ noi α ) α 1/2 δ C ) g + τ Jσ), σ)σϕ noi α ) α 1/2 α α with C := ρcµ c 2 lnσ 2 ) Φnoi )α 1/2 c 2 1 γ app 1 c 2 1 γ 1 lnσ 2 )α 1/2 3.16) noi C lnσ 2 )α µ 1/2 γ app 1. Furthermore, a straightforward calculation shows that the 1 γ 1 noi inmum in 3.11) decays at a polynomial rate in σ, so κ lnσ) for some κ > 0. As lim x xq cx = 0 for all c > 0, there exists σ 0 > 0 such that C lnσ 2 )α µ 1/2 C α0 q ) µ 1/2 lnσ 2 )q lnσ 2 ) κ/2) µ 1/2) C stop for all σ 0, σ 0 ]. This together with 3.16) gives the assertion. The right hand side of the estimate 3.8) is nown to be an order optimal error bound for the linearized problem 2.2) under mild assumptions cf. [5]). Again, more explicit bounds can easily be derived und under more specic assumptions. E.g. if δ = 0 and ϕ noi is given by 2.9), we obtain [ E â a 2]) ) 1/2 = O ρ c µ µ+c σ µ+c, σ ) 4. Adaptation by Lepsi 's balancing principle. The stopping index in Theorem 3.2 cannot be computed since it depends on and hence the smoothness of the dierence a 0 a of the initial guess and the unnown solution. In this section we address the problem how to choose the stopping index of the Newton iteration adaptively using a Lepsi -type balancing principle. We will present the balancing principle in a general framewor adapted from [19]. From our choice of notation it will be obvious how the regularized Newton methods 1.3) t into this framewor and how the stopping index can be selected for these methods. As the same method will hopefully also apply to other regularization methods than 1.3), we have decided to use this more general setting. 11

12 Fig Illustration of setting in section 4 General abstract setting. Let â 0, â 1,..., â max be estimators of a in a metric space X, d), and let Φ noi, Φ app, Φ nl : N 0 [0, ) be functions such that dâ, a ) Φ noi ) + Φ app ) + Φ nl ), max. 4.1) We assume that Φ noi is nown and non-decreasing, Φ app is unnown and non-increasing, and Φ nl is unnown and satises Φ nl ) γ nl Φ noi ) + Φ app )), = 0,..., max 4.2) for some γ nl > 0. This is illustrated in Fig. 4.1 Note that under the assumptions of Lemma 2.2 these inequalities are satised for the iterates of the generalized Gauss- Newton method 1.3) with Φ app ) := and Φ nl ) := E nl. As in [3] we consider the following Lepsi -type parameter selection rule: bal := min { max : dâ, â m ) 41 + γ nl )Φ noi m), m = + 1,..., max } 4.3) Deterministic errors. We rst recall some results from [19, 21] for linear problems, i.e. Φ nl 0. Assume that Φ noi satises Φ noi + 1) γ noi Φ noi ), = 1, 2,..., max 1 4.4) for some constant γ noi <, and let := min{ max : Φ app ) Φ noi )}. 4.5) Then, as shown by Mathé & Pereverzev [21], the following deterministic oracle inequality holds true: dâ bal, a ) 6Φ noi ) 6γ noi min Φ app ) + Φ noi )) 4.6) =1,..., max This shows that bal yields an optimal error bound up to a factor 6γ noi. If 4.1) and 4.2) hold true, then Assumption 2.1 in [3] is satised with E)δ = 21 + γ nl )Φ noi ) in the notation of [3]) and := min{ max : Φ noi ) + Φ app ) + Φ nl ) 21 + γ nl )Φ noi )}. 4.7) Therefore, Theorem 2.3 in [3] implies the error estimate dâ bal, a ) 61 + γ nl )Φ noi ). 4.8) We obtain the following order optimality result inspired by Mathé [19]: 12

13 Theorem 4.1 deterministic oracle inequality) Assume 4.1)4.4). Then dâ bal, a ) 6Φ noi ) 61 + γ nl )γ noi min Φ app) + Φ noi )). 4.9) {1,..., max} Proof. As Φ app ) Φ noi ) implies Φ noi ) + Φ app ) + Φ nl ) 1 + γ nl ) Φ app ) + Φ noi )) 21 + γ nl )Φ noi ), we have with dened in 4.5). Therefore, we get dâ bal, a ) 4.8) 61 + γ nl )Φ noi ) 61 + γ nl )Φ noi ) 4.6) 61 + γ nl )γ noi min {Φ app ) + Φ noi ) : = 1,..., max }. Obviously, we could add the term Φ nl ) to Φ app ) + Φ noi ) on the right hand side of 4.9). Hence, Theorem 4.1 implies that the Lepsi rule 4.3) leads to an optimal error bound up to a factor 61 + γ nl )γ noi among all = 1,..., max. However, Theorem 4.1 even implies the stronger result that we obtain the same rates of convergence as in the linear case, which are often nown to be minimax. Setting Φ app ) :=, Φ noi ) = δc g / α and Φ nl ) := E nl yields the following oracle inequality for the Gauss-Newton iteration, where we have replaced {1,..., max } by N see Theorem 3.1): Corollary 4.2 Let the assumptions of Lemma 2.2 hold true for σ = 0, i.e. Φ noi ) = δc g α. Furthermore let bal be chosen as in 4.3). Then â bal a 61 + γ nl )γ noi min + C ) gδ. 4.10) N α Remar 4.3 If a belongs to the smoothness class M µ,ρ := {a 0 + T T ) µ w : w ρ} dened by the source condition 3.2) with µ [1/2, µ 0 ], then it follows from 4.10) and 3.3) that â bal a min C µ α µ ρ + C ) ) gδ = O ρ 1 2µ 2µ+1 δ 2µ+1, δ 0, 4.11) N α and for linear problems it is well-nown that among all possible methods this is the best possible uniform estimate over M µ,ρ up to a constant see [12]). Assume now that µ [1/2, µ 0 ). Then lim α 0 λ/α) µ r α λ) = 0 for all λ 0 and sup λ,α λ/α) µ r α λ) C µ, and it follows from spectral theory and Lebesgue's dominated convergence theorem that α µ r α T T )T T ) µ w 0 as α 0 for all w X, i.e. C µ ρα µ 0,. 4.12) As we will show in a moment, this implies that for all a M µ,ρ ) min N + Cgδ α ) 0, δ ) min N C µ α µ Cgδ ρ + α 13

14 4.13) is the deterministic analog of what is nown as supereciency in statistics see [7]). To show 4.13), let ɛ > 0. Using Lemma 3.3, eq. 3.11) with σ = δ and ϕ noi α) = C g / α, we obtain that δ) := argmin N + Cgδ α tends to as δ 0. Therefore, it follows from 4.12) that there exists δ 0 such that δ) ɛc µρα µ δ) for all δ < δ 0. Using eq. 3.10) in Lemma 3.3 we obtain C g δ/ α δ) C δ) 1 CɛC µ ρα µ γapp 1 δ) with C := 1 q. Now a straightforward computation shows that inf C µ α µ ρ + C ) gδ CCɛ) 2µ/2µ+1) C µ ρα µ α>0 α δ) CCɛ) 1/2µ+1) Eapp δ) + C C gδ/ α δ) 1 + C with C > 0 independent of ɛ and δ. This shows 4.13) since we can mae Cɛ) 1/2µ+1) arbitrarily small. Eq. 4.13) implies that although estimates of the form 4.11), nown in deterministic regularization theory for the discrepancy principle and improved parameter choice rules [12, 16]), are order optimal as uniform estimates over a smoothness class, they are suboptimal by an arbitrarily large factor for each individual element of the smoothness class in the limit δ 0. To our nowledge it is an open question whether or not deterministic parameter choice rules other than Lepsi 's balancing principle are optimal in the more restrictive sense of oracle inequalities. General noise model. We return to the general noise model 1.2). Theorem 4.4 Let the assumptions of Lemma 2.2 hold true with max determined ln σ 2 c 2 by 2.14) with Φ noi ) := δ Cg α + σϕ noi α ), and assume 3.2) with µ > 1 2. Let := bal as in 4.3) if â B 2R a 0 ) for = 1,..., max and := 0 else. Then there exists a constants C, δ 0, σ 0 > 0 such that [ E â a 2]) 1/2 C min N + δ C g + ln σ 1 )σϕ noi α ) α ) 4.14) for δ [0, δ 0 ] and σ 0, σ 0 ]. Proof. First consider the event A τ,max of bounded noise dened by 2.10b) with τ, σ) = ln σ 2 )/c 2. Then the assumptions of Theorem 4.1 are satised with Φ app ) := and Φ nl ) := E nl due to 2.3a), 2.8b) and Lemma 2.2, and we get â bal a 61 + γ nl )γ noi min + Φ noi )) =1,..., max As µ > 1 2, we can use the same arguments as in Lemma 3.4 to show that := argmin N + Φ noi )) max for δ, σ small enough, and hence the minimum may be taen over all N 0. Moreover, we have max PCA τ,max ) c 1 =1 exp ln σ 2 ) = c 1 max σ 2 C tail σ 2 ln σ 1, for some constant C tail > 0. For the last inequality we have used that max = O ln σ 1) due to the denition of max and the fact that ϕ noi is decreasing. Using 14

15 again that ϕ noi is decreasing and as σ 0, we get PCA τ,max ) C tail σ 2 ln σ 1 C min N 0 for σ small enough with a generic constant C. Hence, + δ C g + ln σ 1 )σϕ noi α ) α ) ) E [ â a 2] PA τ,max ) min + Φ noi )) 2 + PCA τ,max )3R) 2 =1,..., max C 2 min + δ C 2 g + ln σ 1 )σϕ noi α )). α =1,..., max In particular, for δ = 0 and ϕ noi is given by 2.9), we obtain [ E â a 2]) ) 1/2 = O ρ c µ+c σ ln σ 1 ) µ µ+c, σ ) Comparing 4.16) to 3.17) or 4.14) to 3.8) shows that we have to pay a logarithmic factor for adaptivity. As shown in [26] it is impossible to achieve optimal rates adaptively in the general situation considered in this paper. However, for special classes of linear inverse problems, which are not too ill-posed, order optimal adaptive parameter choice rules have been devised see e.g. [9]). It remains an interesting open problem to construct order optimal adaptive stopping rules for mildly ill-posed nonlinear statistical inverse problems. 5. Numerical experiments. To test the predicted rates of convergence with random noise and the performance of the stopping rule 4.3), we consider a simple parameter identication problem for an ordinary dierential equation where the forward operator F is easy to evaluate and reliable conclusions can be obtained by Monte Carlo simulations within reasonable time. The eciency of the iteratively regularized Gauss-Newton method to solve large-scale inverse problem has been suciently demonstrated in a number of previous publications see e.g. [13]). Forward operator. For a given positive function a L 2 [0, 1]) and a given right hand side f L 2 [0, 1]) let u H 2 per[0, 1]) denote the solution to the ordinary dierential equation u + au = f in [0, 1]. 5.1) Here H s per[0, 1]), s 0 denotes the periodic Sobolev space of order 2 with norm u 2 H = j 2 ) s ux) exp 2πijx) dx per s. j N 0 We introduce the parameter-to-solution operator F : DF ) L 2 [0, 1]) L 2 [0, 1]) a u. If G denots the inverse of the dierential operator 2 x + 1 with periodic boundary conditions, the dierential equation 5.1) can equivalently be reformulated as an 2 integral equation u + GM a 1 u = Gf 15

16 with the multiplication operator M a 1 u := a 1)u. To prove the Fréchet dierentiablity of F, it is convenient to consider M a 1 as an operator from C per [0, 1]) to L 2 [0, 1]) and G as an operator from L 2 [0, 1]) to C per [0, 1]). Then M a is bounded and G is compact, and it is easy to show that F is analytic on the domain DF ) := {a L 2 [0, 1]) : I + GM a 1 boundedly invertible in C per [0, 1])}. In particular, F satises the Lipschitz condition 2.11). By a Neumann series argument, DF ) is open, and it contains all positive functions. Dierentiation of 5.1) with respect to a shows that for a perturbation h of a the Fréchet derivative u h = F [a]h satises the dierential equation u h + au h = hu in [0, 1] where u is the solution to 5.1). This implies that F [a]h = I + GM a 1 ) 1 GM u h. We briey setch how Hölder source conditions 3.2) can be interpreted as smoothness conditions in Sobolev spaces under certain conditions. Assume that a H s DF ) with s > 1/2, and f C. Here and in the following we shortly write H s for Hper[0, s 1]). Since G : H t H t+2 is an isomorphism for all t 0 and uv H t for u, v H t with uv H t C u H t v H t for t > 1/2, it can be shown that I + GM a : H t H t is an isomorphism for t [0, s] and that u H s. Moreover, the operators F [a], F [a] : H t H mins,t+2) are bounded, and if u has no zeros in [0, 1] and t s 2 the inverses are bounded as well. Under this additional assumption, it can be shown using Heinz' inequality see [12]) that F [a] F [a]) µ : L 2 H 4µ is an isomorphism if 2 2µ s. Thus, a, a 0 H s implies a Hölder source condition with µ = s/4. Moreover, the singular values of F [a] behave lie σ j F [a]) j 2, and this asymptotic behavior is uniform for all a with a L C. Noise model. Let us assume our data are n noisy measurements of u = F a ) at equidistant points x n) j := j n, Y j = u x n) j ) + ɛ j, j = 1,..., n. 5.2) The measurement errors are modeled by i.i.d. random variables ɛ j satisfying E [ɛ] j = 0, Varɛ j = σ 2 ɛ <. For n even we introduce the space Π n := span { e 2πijx : j = n/2,..., n/2 1 } and the linear mapping S n : R n Π n, which maps a vector u = u 1,..., u n ) of nodal values to the unique trigonometric interpolation polynomial S n u Π n satisfying S n u)x n) j ) = u j, j = 1,..., n. We will show that u obs := S n Y with Y := Y 1,..., Y n ) satises assumption 1.2a). Hence, we interpret u obs as our observed data. Since ns n is unitary, u obs and Y have unitarily equivalent covariance operators. We have δη = E [u] obs F a ) and σξ = u obs E [u] obs in 1.2a), and E [u] obs is the trigonometric interpolation polynomial of F a ) at the points x n) j. Hence, by standard estimates of the trigonometric interpolation error see e.g. [24, Cor. 2.47]) the deterministic error of the data function u obs is bounded by δ = E [u] obs u L 2 C n 2 u H 2. Moreover, the covariance operator of the stochastic noise is given by Cov u obs = Cov Snɛ = S n Cov ɛ S n = σ2 ɛ n P n, 16

17 σ = 0.1 n = 200 n = 800 n = 3200 max = 14 σ ɛ = 0.01 σ ɛ = 0.02 σ ɛ = 0.04 optimal min â a ± ± ± argmin â a 8.92 ± ± ± 0.22 â bal a ± ± ± balancing bal 7.00 ± ± ± 0 I bal â discr a ± ± ± discrepancy discr 6.09 ± ± ± 0 I discr σ = 0.01 n = 200 n = 800 n = 3200 max = 22 σ ɛ = σ ɛ = σ ɛ = optimal min â a ± ± ± argmin â a ± ± ± 0.47 â bal a ± ± ± balancing bal ± ± ± 0 I bal â discr a ± ± ± discrepancy discr ± ± ± 0 I discr Table 5.1 Comparison of balancing principle and discrepancy principle as stopping rules. For each value of n and σ ɛ the algorithm was run 100 times. The values after ± denote standard deviations. Fig left panel: rates of convergence of E ˆ a â bal 2 ) 1/2 as σ 0 for dierent smoothness of exact solution. The triangles indicate the rates predicted by theory and the bars the empirical standard deviations of the reconstruction error. right panel: exact solutions. where ɛ := ɛ 1,..., ɛ n ), and P n LL 2 [0, 1])) is the orthogonal projection onto Π n. Note that the stochastic noise level σ = σ ɛ n 5.3) dominates the deterministic noise level δ = On 2 ) for large n. 17

18 1 25 Numerical results. As exact solutions a we used three functions of dierent smoothness shown in the right panel of Fig These functions were dened in terms of Fourier coecients such that they belong to t>s Ht per[0, 1]) with s {0.5, 1.5, 3.5}. The initial guess was always chosen as the constant function 1. We never had to stop the iteration early because an iterate was not in the domain of denition of F. We rst tested the performance of the balancing principle for σ = 0.1 and σ = 0.01 and three dierent values on n cf. eq. 5.3)) for the curve with s = 1.5. For each value of σ and n, 100 independent data vectors Y were drawn from a Gaussian distribution according to the additive noise model 5.2). We chose Φ noi ) := 25 l=1 g α T T )T ɛl) 2 ) 1/2, where ɛ l) are independent copies of the noise vector. Moreover, we set γ nl = 0.1 in 4.3). Note that this choice of Φ noi is not fully covered by our theory since we only use an estimator of the expected value, as opposed to 2.8a) we do not have a uniform bound over a DF ), and we dropped the logarithmic factor in Theorem 4.4. Furthermore, recall that the Lepsi rule requires the computation of iterates up to a xed max specied in 2.14). Usually the Lipschitz constant L involved in the denition 2.14) of max is not nown exactly. Fortunately, we only need an upper bound L on the Lipschitz constant, and the experimental results are insensitive to the choice of L. As expected from the assumtions of our convergence results, the reconstructions are also insensitive to the choice of α 0 > 0 and q 0, 1) as long as they are suciently large. Further increasing α 0 and q results in more Newton steps and hence more computational eort to reach the optimal value of α = α 0 q, but no noticable dierence in the reconstructions. The results of our rst series of simulations are summarized in Table 5.1. ineciency index of a stopping rule we used the number I := E [ â a 2] ) 1/2 E [min =0,...,max â a 2 ]) 1/2. The results displayed in Table 5.1 demonstrate that both the expected optimal error the denominator in the previous expression) and the expected error for the balancing principle 4.3) the numerator) only depend on σ = σ ɛ / n, but not on n. This is in contrast to the discrepancy principle, which is dened in a discrete setting by discr := min{ : n 1/2 F â ) Y R n τσ ɛ } Here τ = 2.1, Y = Y j ) j=1..n is the vector dened in 5.2), F a)) j := F a))x n) j ), j = 1,..., n, and the factor n 1/2 before the Eulidean norm in R n is chosen such that n 1/2 F a) R n F a) L [0,1]). The discrepancy principle for the noise model 2 5.2) wors fairly well for small n, but badly for large n, as previously observed e.g. in [18] for linear problems. The reason is that the standard deviation of the measurement [ n 1/2 error n E 1/2 j=1 j]) ɛ2 = σɛ = σ n tends to innity as n with constant σ, and hence the discrepancy principle stops too early. In fact this happens almost always at the rst step for n suciently large, whereas the optimal stopping index, which asymptotically depends only on σ, but not on n, may be arbitrarily large for small σ. Finally we tested the rates of convergence with the balancing principle for the three curves a shown in the right panel of Fig We always chose n = 128 and varied σ ɛ. For each value of σ and each of the three curves a we performed 50 runs 18 As

19 of the iteratively regularized Gauss-Newton method. The triangles in the left panel show the rates σ 1/6, σ 3/8, and σ 7/12, which are obtained by neglecting the logarithmic factor in 4.16) and setting c = 5 8 and µ = 1 8, 3 8, 7 8 following the discussion above. Note that the experimental rates agree quite well with these predicted rates even for the rst two functions, which are not smooth enough to be covered by our theory. In summary we have demonstrated that the performance of the balancing principle is independent of the sample size n whereas the discrepancy principle wors well for small n, but becomes more and more inecient as n. Moreover, for the balancing principle the empirical rates of convergence match the theoretical rates very well. Appendix A. exponential inequality for Gaussian noise. In this appendix we will prove the exponential inequality 2.8c) for the case that the noise process ξ in 1.2a) is Gaussian. Let us consider the random ] variable V = Λξ 2 for an arbitrary linear operator Λ : Y X such that E [ Λξ 2 <. Since Cov Λξ = ΛMΛ with M := Cov ξ, the operator ΛMΛ is trace class, i.e. the eigenvalues λ 1 λ 2 of ΛMΛ satisfy i=1 λ i = E [V ] <. Note that M 1/2 ξ is a Gaussian white noise process, i.e. the random variables ξ i := M 1/2 ξ, ϕ i are i.i.d. standard normal for any orthonormal system {ϕ i } in Y. In particular, if {ϕ i } is a system of left singular vectors of ΛM 1/2, then V = i=1 λ iξi 2. The random variables ξ2 i are i.i.d. χ2 1 with Laplace transform E [ exp tξ 2 i )] = 1 2t) 1/2, 0 < t < 1 2. Then it holds that P λ i ξi 2 1) η 2 i=1 i=1 λ 2 i exp 1 8 η ) 4 A.1) for any η > 0 see [11, 25]). Theorem A.1 Under the above conditions we have for any τ 1 that 1 P V τe [V ]) exp 8 C ) τ 1), A.2) 4 where C := 2 1/2 i=1 λ i/ i=1 λ2 i )1/2. Proof. Rewrite { } {V τe [V ]} = λ i ξi 2 τ λ i = λ i ξi 2 1) η 2 i=1 where η = τ 1)C and apply A.1). i=1 i=1 i=1 λ 2 i, Since λ i 0, we have i=1 λ2 i i=1 λ i) 2, and hence C 2 1/2 for any eigenvalue sequence λ i ). Therefore, choosing Λ := g α F [a] F [a])f [a], we obtain 2.8c) with c 1 = exp ) and c 2 2 = A variation of the proof in [25] allows in principle an extension for more general non χ 2 ) random varables under proper growth conditions on the Lqaplace transform of the ξi 2. This would give an exponential bound of the form 2.8c), again. Acnowledgment. The support of the Graduiertenolleg 1023, the DFG research group FOR 916, and the Upper Austrian Technology and Research Promotion are gratefully acnowledged. 19

20 REFERENCES [1] A. B. Baushinsy. The problem of the convergence of the iteratively regularized Gauss-Newton method. Comput. Maths. Math. Phys., 32: , [2] A. B. Baushinsy and M. Y. ourin. Iterative Methods for Approximate Solution of Inverse Problems. Springer, Dordrecht, [3] F. Bauer and T. Hohage. A Lepsij-type stopping rule for regularized Newton methods. Inverse Problems, 21: , [4] N. Bissantz, T. Hohage, and A. Mun. Consistency and rates of convergence of nonlinear Tihonov regularization with random noise. Inverse Problems, 20: , [5] N. Bissantz, T. Hohage, A. Mun, and F. Ruymgaart. Convergence rates of general regularization methods for statistical inverse problems and applications. SIAM J. Numer. Anal., 45: , [6] B. Blasche, A. Neubauer, and O. Scherzer. On convergence rates for the iteratively regularized Gauss-Newton method. IMA J. Num. Anal., 17:421436, [7] T. T. Cai and M. G. Low. nonparametric estimation over shrining neighborhoods: supereciency and adaptation. Ann. Stat., 33:184213, [8] E. Candès. Modern statistical estimation via oracle inequalities. Acta Numerica, 15:257325, [9] L. Cavalier, G.. Golubev, D. Picard, and A. B. Tsybaov. Oracle inequalities for inverse problems. Ann. Stat., 30:843874, [10] R. Cont and P. Tanov. Retrieving Lévy processes from option prices: Regularization of an ill-posed inverse problem. SIAM J. Control Optim., 451):125, [11] R. Dahlhaus and W. Poloni. Nonparametric quasi maximum lielihood estimation for Gaussian locally stationary processes. Ann. Statist., 34: , [12] H. W. Engl, M. Hane, and A. Neubauer. Regularization of Inverse Problems. luwer Academic Publisher, Dordrecht, Boston, London, [13] T. Hohage. Fast numerical solution of the electromagnetic medium scattering problem and applications to the inverse problem. J. Comp. Phys., 214:224238, [14] T. Hohage and M. Pricop. Nonlinear Tihonov regularization in Hilbert scales for inverse boundary value problems with random noise. Inv. Probl. Imag., 2:271290, [15] B. altenbacher. Some Newton-type methods for the regularization of nonlinear ill-posed problems. Inverse Problems, 13:729753, [16] B. altenbacher. A posteriori parameter choice strategies for some Newton type methods for the regularization of nonlinear ill-posed problems. Numer. Math, 79:501528, [17] J.-M. Loubes and C. Lude«a. Penalized estimators for nonlinear inverse problems. arxiv.org:math, v1, [18] M. A. Luas. Comparison of parameter choice methods for regularization with discrete noisy data. Inverse Problems, 14:161184, [19] P. Mathé. The Lepsi principle revisited. Inverse Problems, 22:L11L15, [20] P. Mathé and S. Pereverzev. Geometry of ill-posed problems in variable Hilbert scales. Inverse Problems, 19:789803, [21] P. Mathé and S. Pereverzev. Regularization of some linear ill-posed problems with discretized random noisy data. Math. Comp., 75: , [22] W.. Newey and J. L. Powell. Instrumental variable estimation of nonparametric models. Econometrica, 71: , [23] F. O'Sullivan. Convergence characteristics of method of regularization estimators for nonlinear operator equations. SIAM J. Num. Anal, 27: , [24] S. Prössdorf and B. Silbermann. Numerical Analysis for Integral and Related Operator Equations. Birhäuser, Basel, [25] A. Rohde and L. Dümbgen. Condence sets for the best approximating model - bridging a gap between adaptive point estimation and condence regions [26] A. Tsybaov. On the best rate of adaptive estimation in some inverse problems. C. R. Acad. Sci. Paris, 300:835840,

Statistical Inverse Problems and Instrumental Variables

Statistical Inverse Problems and Instrumental Variables Thorsten Hohage Institut für Numerische und Angewandte Mathematik University of Göttingen Workshop on Inverse and Partial Information Problems: Methodology