ANALYSIS OF THE MINIMAL RESIDUAL METHOD APPLIED TO ILL-POSED OPTIMALITY SYSTEMS

Size: px

Start display at page:

Download "ANALYSIS OF THE MINIMAL RESIDUAL METHOD APPLIED TO ILL-POSED OPTIMALITY SYSTEMS"

Jerome Townsend
5 years ago
Views:

1 ANALYSIS OF THE MINIMAL RESIDUAL METHOD APPLIED TO ILL-POSED OPTIMALITY SYSTEMS BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL Abstract. We analyze the performance of the Minimal Residual Method applied to linear Karush-Kuhn-Tucker systems arising in connection with inverse problems. Such optimality systems typically have a saddle point structure and have unique solutions for all α > 0, where α is the parameter employed in the Tikhonov regularization. Unfortunately, the associated spectral condition number is very large for small values of α, which strongly indicates that their numerical treatment is difficult. Our main result shows that a broad range of linear ill posed optimality systems can be solved with a number of iterations of order O(ln(α 1 )). More precisely, in the severely ill posed case the number of iterations needed by the Minimal Residual Method cannot grow faster than O(ln(α 1 )) as α 0. This result is obtained by carefully analyzing the spectrum of the associated saddle point operator: Except for a few isolated eigenvalues, the spectrum consists of bounded intervals. Krylov subspace methods handle such problems very well. We illuminate our theoretical findings with some numerical results for inverse problems involving partial differential equations. Our investigation is inspired by Prof. H. Egger s discussion of similar results valid for the conjugate gradient algorithm applied to the normal equations. 1. Introduction In recent years many researchers have studied the numerical treatment of inverse problems. Especially the minimization of quadratic cost functionals with constraints expressed in terms of partial differential equations (PDEs) - so called PDE constrained optimization [8, 9, 17, 3, 39]. This type of problems can be solved with iterative minimization methods or one can use the Lagrange multiplier technique to obtain a system of equations which must be satisfied by the optimal solution. If the latter approach is applied, discretization leads to a large system of algebraic equations, and the optimization problem can be solved with an all-at-once method. That is, a method in which the optimality condition, the state equation and its adjoint, which constitute the optimality system, are solved in a fully implicit manner. The optimality system typically inherits the ill posed nature of the underlying inverse problem. Regularization techniques must therefore be invoked, but the stability of the system deteriorates as the regularization parameter α > 0 approaches zero. The condition number of the discretized optimality Date: March 1, 01. Key words and phrases. inverse problems, all-at-once, Karush-Kuhn-Tucker (KKT) systems, Krylov subspace methods. 1

2 BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL system is consequently large for small values of α. In addition, the condition number will also typically increase significantly as the mesh parameter h > 0, used in the discretization of the involved PDE, decreases. The purpose of this paper is to analyze the Minimal Residual Method [30] applied to a large class of ill posed linear optimality systems. Let H 1 be the parameter/control space, H the state space, and H 3 the observation space, with norms H1, H, and H3. We study the numerical treatment of (1) min v H 1, u H subject to { 1 T u d H } α v H 1 () Au = Bv (state equation). The quantity d is given and α 0 is a regularization parameter. This type of problems typically arise when one wants to use an observation d to recover the parameter/control v in the state equation. If (3) A α p = b is the optimality system associated with (1)-(), then the main objectives of this text can be roughly formulated as follows: a): We prove that the spectrum of A α is almost contained in bounded intervals: sp(a α ) [ b, a] [cα, dα] {λ 1, λ,..., λ N(α) } [a, b], where N(α) is of order O(ln(α 1 )) in the severely ill posed case and a, b, c, d > 0 are constants that do not depend on α. Krylov subspace solvers are well known to handle problems with few isolated eigenvalues excellently, see e.g. [4]. This matter is discussed in detail for the Minimal Residual Method in this paper. We have earlier studied a particular preconditioning strategy for ill posed KKT systems that lead to (theoretical) iterations counts of order O((ln(α)) ), see [9]. The present text may be regarded as a follow-up paper to that article. b): Through numerical examples we show how a) can be employed to solve PDE constrained optimization problems efficiently. Our investigation is presented in terms of functional analysis and uses basic properties of inverse problems. The results are therefore applicable whenever a problem can be written in the form (1)-(). Nevertheless, we were motivated by practical experience with PDE constrained optimization, and this text is primarily a contribution to the knowledge about such problems. The numerical treatment of saddle point operators arising in connection with PDEs is a contemporary research field [, 6, 11, 15, 16, 1, 33, 36, 40] - see [6] for a rather recent review. For well posed problems one of the main issues is to obtain iteration counts that are acceptable as the

3 ANALYSIS OF THE MINIMAL RESIDUAL METHOD 3 mesh parameter h > 0 decreases. If the parameter identification task at hand is ill posed, then one must also assure that the iterative schemes can handle cases with small regularization parameters, i.e. that the number of iterations needed does not increase significantly as α 0. The latter type of problem has been addressed in many papers for various models [1, 7,, 7, 31, 3, 34, 35, 37] - i.e. for special cases of elliptic and parabolic control problems. For some particular cost functionals, remarkable algorithms that are completed robust with respect to α have been developed [34, 35, 38, 41]. In this paper an abstract approach is used to cover a rather broad range of saddle point problems, and we conclude that Krylov subspace solvers might be an attractive alternative for their effective numerical solution. If one wants to solve a practical problem involving real world data, it is almost certainly not sufficient to solve (3) once with one particular choice of α. In fact, procedures for estimating an appropriate size of the regularization parameter typically requires that (3) is solved repeatedly for a sequence of different values of α, see e.g. [0]. We may thus conclude that the efficiency needed to solve an inverse problem is of a different magnitude than what is required for a well posed problem - the fast numerical solution of (3) is crucial. The next section contains all but one of the assumptions that we need. Notation for the optimality system is introduced in Section 3, which also contains the final assumption. The performance of the conjugate gradient method applied to the normal equations associated with (1)-() is briefly discussed in Section 4, and Section 5 is devoted to the eigenvalue distribution of the indefinite optimality system (3). Our numerical experiments are presented in Section 6, and the theoretical convergence behavior of the Minimal Residual Method is analyzed in Section 7.. Assumptions Throughout this text c, c, C and C are (generic) positive constants that do not depend on the regularization parameter α. We limit our analysis to linear state equations () and assume that A1: A : H H is bounded and linear 1, A: A 1 exists and is bounded, A3: B : H 1 H is bounded and linear, A4: T : H H 3 is bounded and linear, and A5: the inf-sup condition holds: (4) inf w H (Bv, w) + (Au, w) sup c. (v,u) H 1 H v + u w H1 H H 1 If the state equation () is a PDE, then A will typically be a mapping from H to its dual space H. An operator R 1 : H H must thus be applied to the state equation in order to get a mapping R 1 A : H H. One may regard R 1 to be a preconditioner [6]. In the present text, R 1 will be the inverse of the Riesz map or a suitable multigrid approximation of this operator. We will return to this issue in the numerical experiment section.

4 4 BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL Note that A is the mapping that must be inverted in order to solve the state equation, B maps the (unknown) parameter/control v into the state equation, and T is the observation operator. From A and A3 it follows that the solution u of () depends continuously on v: (5) u H C v H1, i.e. the state equation is well posed. 3. Optimality system and one more assumption The optimality system associated with (1)-() can be derived by employing standard techniques. Details about this issue can be found in Appendix A. Here we merely state the result. That is, the solution of (1)-() must solve the following problem: Find (v, u, w) H 1 H H such that (6) αi 0 B 0 T T A B A 0 v u w = 0 T d 0 where w is the Lagrange multiplier and the * notation denotes adjoint. It is well known that this system typically is indefinite and hence has both positive and negative eigenvalues. For the sake of convenience, we introduce the notation (7) A α = p = b = αi 0 B 0 T T A B A 0 v u, w 0 T d, 0 and (6) can be written in the form (8) A α p = b. Note that and that for α = 0 we get (9) A 0 =, for α 0, A α : (H 1 H H ) (H 1 H H ), 0 0 B 0 T T A B A 0 which contains zero regularization. Throughout this text we consider problems of the form (1)-() that are ill posed for α = 0. This undesirable property is likely to be inherited by the optimality system (6):,

5 ANALYSIS OF THE MINIMAL RESIDUAL METHOD 5 A6: We assume that A 0 is a compact operator, i.e. λ i (A 0 ) 0 as i. Furthermore, in the severely ill posed case the eigenvalues are assumed to satisfy (10) λ i (A 0 ) c e Ci for i = 1,,.... Here, c and C are positive constants not depending on α, since A 0 does not involve α. We also assume that the spectrum of A α is discrete/countable for every α > 0. If, unexpectedly, the ill posedness of (1)-() for α = 0 is not inherited by (6), then the numerical solution of the latter is of course much easier. In a more general version of A6 one could assume that the spectrum of A 0 only contains a subsequence that fulfills a bound of the form (10). This, however, leads to an even more involved analysis, which would have to address non compact operators. (Moreover, for problems with continuous spectra it is not even straightforward to define the concepts severely and mildly ill posed [4]). The main purpose of this text is to analyze the convergence behavior of the Minimal Residual Method applied to (8). Our main result shows that the number of iterations required by this algorithm can not grow faster than O(ln(α 1 )) as α 0. In many ways one may regard our approach as a generalization of a similar type of investigation of the Conjugate Gradient (CG) method, which is only applicable to positive definite problems. We therefore begin with a brief study of the convergence properties of the CG algorithm applied to severely ill posed equations subject to Tikhonov regularization. 4. Conjugate Gradient method If A can be efficiently inverted, then it turns out that we can conveniently solve (1)-() with the CG method. More specifically, () implies that where T u = T A 1 Bv = F v, F = T A 1 B : H 1 H 3 is the direct/forward mapping. Consequently, we may write (1)-() in the form { 1 min v H 1 F v d H } α v H 1, which gives us the normal equations (11) F F v + αiv = F d. Clearly, F F + αi : H 1 H 1 is symmetric and positive definite for every α > 0. This problem can therefore be solved with the CG method.

6 6 BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL 4.1. Convergence behavior. The material presented in this subsection is inspired by Prof. H. Egger s talk at the Applied Inverse Problems conference in Vienna in 009. Prof. H. Egger has published several papers addressing preconditioning issues for inverse problems [1, 13]. We assume that the spectrum of F F is discrete and that (11) is severely ill posed for α = 0. That is, the eigenvalues of F F, sorted in decreasing order, satisfy (1) 0 λ i (F F ) c e Ci for i = 1,,.... The eigenvalues of F F + αi then obviously satisfy and invoking (1) yields (13) λ i (F F + αi) = λ i (F F ) + α, α λ i (F F + αi) c e Ci + α for i = 1,,.... In the rest of this subsection we simply write λ i instead of λ i (F F + αi). From (13) it is evident that the number of eigenvalues that are larger than α is of order O(ln(α 1 ): (14) sp(f F + αi) [α, α] {λ 1, λ,..., λ N(α) }, λ i > α for i = 1,,..., N(α), ln(c) ln(α) N(α) = O(ln(α 1 )), C where N(α) is the integer such that λ 1 λ... λ N(α) > α λ N(α) The CG method is known to handle systems with spectrum of the kind (14) very well: See Axelsson and Lindskog [5]. Their result is formulated in terms of an error tolerance ɛ, the exact solution v of the regularized normal equations (11) and the energy norm x E = ((F F + αi)x, x) H1, x H 1. Let v 0 and v k denote the initial guess and kth approximation generated by the CG algorithm applied to (11), respectively. According to Axelsson and Lindskog s paper [5], pages , (15) for where and v v k E v v 0 E ɛ k = N(α) + k (α, α, ɛ), k (α, α, ɛ) = τ = 1 α α 1 + α α ln ɛ ln τ 1 1 = Since τ does not depend on α, neither does k. We conclude that the number of CG iterations needed to reduce the error by a factor of ɛ can not grow 1.

7 ANALYSIS OF THE MINIMAL RESIDUAL METHOD 7 faster than N(α), which is of order O(ln(α 1 )), as α 0. Hence, we only get logarithmic growth. Standard CG convergence theory states that the error reduction (15) is achieved in at most O( K) iterations, where K is the spectral condition number of F F + αi, see e.g. [4]. Because K = O(1/α), we conclude that Axelsson and Lindskog s approach reveals that the classical analysis provides a very pessimistic estimate. In order to use the approach described above one must compute A 1, i.e. the operator on the left hand side of the state equation () must be inverted. This might be very CPU demanding and in many cases impossible or inconvenient, especially if the state equation () is a complicated mathematical model. The explicit inversion of A can be avoided by solving (8) instead of (11). However, A α is indefinite and we can not use the CG algorithm. In the subsequent sections we will analyze the performance of the Minimal Residual Method applied to (8). Remark. If one insists on using the CG scheme with iteration counts of order O(ln(α 1 )) as α 0, but does not want to compute A 1, this can also be accomplished. Consider equation (8) with no regularization A 0 p = b, where A 0 is defined in (9). From assumption A6 it follows that the eigenvalues of A 0 A 0 satisfy bounds similar to (1). We therefore conclude that the number of CG iterations needed to solve A 0A 0 p + αip = A 0b can not grow faster than O(ln(α 1 )) as α 0. In this equation I is the identity operator. However, A α is Hermitian and this fact should of course be exploited. 5. Minimal Residual Method The main purpose of this paper is to analyze the performance of the Minimal Residual Method applied to the saddle point problem (6), or equivalently, applied to (8). Our approach is similar to the study of the CG algorithm presented above: We characterize the basic structure of the spectrum of A α, defined in (7), and use that knowledge to analyze the convergence behavior as α Basic bounds. In Appendix B standard techniques for saddle point operators are used to obtain bounds for the operator norms of A α and A 1 α : A α C for all α [0, 1], A 1 α 1 for all α (0, 1]. cα We will now employ this information to analyze the eigenvalues of A α. If A α q = λq then λ q = A α q,

8 8 BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL or Likewise, implies that That is or λ = A αq q A α q q A α q = λq 1 λ q = A 1 α q. 1 q A 1 α q, λ λ 1 A 1 α cα. = A α C. Lemma 5.1. Let A α be the operator defined in (7). There exist constants c, C > 0, which are independent of α [0, 1], such that cα λ i (A α ) C for i = 1,,.... Note that these bounds also hold for α = Negative eigenvalues. The next step is to prove that the negative eigenvalues of A α are well behaved for all α 0. More specifically, we will show that the negative eigenvalues cannot approach zero as α 0. Our analysis employs the following auxiliary result: Lemma 5.. Let A be the operator in the state equation (). Assumptions A1 and A imply that AA is coercive, i.e. there exists a constant c > 0 such that (16) (AA φ, φ) c φ H for all φ H. Proof. Assume that there does not exist a constant c > 0 such that (16) holds. Then there exists a sequence {φ n } n=1, with φ n H = 1, such that Let us consider with y = y n = A φ n : (AA φ n, φ n ) 0 as n. (A ) 1 y H y H (A ) 1 y n H y n = ((A ) 1 y n, (A ) 1 y n ) H (y n, y n ) = = = (φ n, φ n ) (A φ n, A φ n ) (φ n, φ n ) (AA φ n, φ n ) 1 (AA as n. φ n, φ n ) Hence, (A ) 1 is not bounded, which contradicts assumption A. We conclude that AA must be coercive.

9 ANALYSIS OF THE MINIMAL RESIDUAL METHOD 9 (This lemma can also be established by using the Bounded Inverse Theorem). The result regarding the negative eigenvalues of A α reads: Lemma 5.3. There exist constants a, b > 0 such that all the negative eigenvalues of A α are contained in the interval [ b, a]. These constants do not depend on the size of the regularization parameter α [0, 1]. (Note that [ b, a] also contains all the negative eigenvalues of A 0 ). Proof. Assume that λ < 0 is a negative eigenvalue of A α with associated eigenfunction (v, u, w) T, i.e. αi 0 B 0 T T A v u = λ v u B A 0 w w or (17) (18) (19) αv + B w = λv, T T u + A w = λu, Bv + Au = λw. Since λ < 0, λi T T is invertible and it follows that v = 1 λ α B w, u = (λi T T ) 1 A w, Bv + Au = λw. Note that w = 0 implies that u = 0 and v = 0, and we may assume that w 0. By inserting expressions (17) and (18) for v and u, respectively, into (19) one finds that or 1 λ α BB w + A(λI T T ) 1 A w = λw (0) λw = 1 λ α BB w + A(T T λi) 1 A w. The next step is to discuss the properties of T T λi. Thereafter we return to (0). Recall that λ < 0 and therefore T T λi is positive definite. Lemma 5.1 states that (1) λ C. We conclude that the spectrum sp(t T λi) of T T λi satisfies sp(t T λi) (0, T T + C]. It follows that (T T λi) 1 also is positive definite and that sp ( (T T λi) 1) [ ) 1 T T + C,.

10 10 BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL If we combine this information with equation (0) we find that λ(w, w) H = 1 λ α (BB w, w) H + (A(T T λi) 1 A w, w) H = 1 λ α (B w, B w) H1 + ((T T λi) 1 A w, A w) H = ((T T λi) 1 A w, A w) H 1 T T + C (A w, A w) H 1 T T + C (AA w, w) H c T T + C (w, w) H, where we have used that λ α < 0 and inequality (16) in Lemma 5.. Consequently, c λ T T + C, which together with (1) finishes the proof, i.e. c a = T T + C, b = C Positive eigenvalues. The negative eigenvalues of A α are well behaved regardless of the size of the regularization parameter α [0, 1]. On the other hand, we have assumed that zero is a cluster point of the spectrum of A 0, see assumption A6. We will now investigate in what sense this property of A 0 is inherited by the positive eigenvalues of A α. Note that where A α = 0 0 B 0 T T A B A 0 = A 0 + E α, E α = + αi αi and we conclude that the difference between A α and A 0 is small, provided that α is small. Is also the difference between the eigenvalues of these two operators small? Yes indeed, as we will now briefly discuss, the Min-max theorem (Courant-Fischer-Weyl min-max principle) [8] provides a strategy for analyzing this issue. Even though we have assumed that A 0 is compact, A α will in general not be compact for α > 0. In the case of infinitely dimensional spaces one can thus not (directly) apply the classical Courant-Fischer-Weyl minmax principle to A α. For the sake of simplicity, we will therefore now only address the finite dimensional setting, which allows the use of the principle.,

11 ANALYSIS OF THE MINIMAL RESIDUAL METHOD 11 (Because the min-max approach also is applicable to the discrete end of the spectrum of self adjoint operators that are bounded below, similar results can be established in the infinitely dimensional case. More specifically, both ci A α and ci A 0 are bounded below for a sufficiently large constant c > 0). Let λ + 1 (A α) λ + (A α)... and λ + 1 (A 0) λ + (A 0)... be the non negative eigenvalues of A α and A 0, respectively, sorted in decreasing order. Note that A α and A 0 are Hermitian. According to the Min-max theorem, see e.g. [8], () (3) (4) (5) λ + k (A α) = max S k min (A αp, p), p S k, p =1 λ + k (A α) = min max (A α p, p), S k 1 p Sk 1, p =1 λ + k (A 0) = max S k min (A 0p, p), p S k, p =1 λ + k (A 0) = min max (A 0 p, p), S k 1 p Sk 1, p =1 where S k and S k 1 denote k dimensional and k 1 dimensional subspaces of H 1 H H, respectively. Now, which implies that (A α p, p) = (A 0 p, p) + (E α p, p) max (A α p, p) = max {(A 0 p, p) + (E α p, p)} p Sk 1, p =1 p Sk 1, p =1 max (A 0 p, p) + max (E α p, p) p Sk 1, p =1 p Sk 1, p =1 max (A 0 p, p) + α p Sk 1, p =1 min max (A α p, p) min max (A 0 p, p) + α S k 1 p Sk 1, p =1 S k 1 p Sk 1, p =1 λ + k (A α) λ + k (A 0) + α, where we have used (3) and (5). In a similar fashion one can employ () and (4) to show that λ + k (A α) λ + k (A 0), and we conclude that (6) 0 λ + k (A 0) λ + k (A α) λ + k (A 0) + α for k = 1,,... (Inequalities of this kind for Hermitian matrices are discussed on page 396 in Golub and Van Loan [18]). Recall that we have assumed that the eigenvalues of A 0 decay exponentially: (7) λ k (A 0 ) c e Ck for k = 1,,....

12 1 BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL But {λ + k (A 0)} is a subsequence of { λ k (A 0 ) } and therefore λ + k (A 0) c e Ck for k = 1,,.... If this bound is combined with inequalities (6) and Lemma 5.1, it follows that: Lemma 5.4. The non negative eigenvalues of A α and A 0, sorted in decreasing order, satisfy (8) cα λ + k (A α) λ + k (A 0) + α c e Ck + α for k = 1,,.... Remember that the negative eigenvalues of both A α and A 0 are well behaved. Inequalities (8) therefore show that the eigenvalues of A 0 that cluster at zero leads to eigenvalues of A α that are contained in an interval of the form [cα, dα]. For example, let N = N(α) be the smallest positive integer such that c e CN(α) α, then λ + k (A α) [cα, α] for all k N(α). Moreover, N(α) is of order O(ln(α 1 )). Remark. The Courant-Fischer-Weyl min-max principle can of course also be used to study the negative elements of the spectra of A α and A 0. Since the analysis is similar to the one presented above, it suffices to state the result: (9) λ k (A 0) λ k (A α) λ k (A 0) + α for k = 1,,.... Roughly speaking, (6) and (9) show that the difference between the eigenvalues of A α and A 0 is of order O(α) Spectrum. We now know that: sp(a α ) [ C, C], see Lemma 5.1, the negative eigenvalues of A α are contained in [ b, ã], cf. Lemma 5.3, the positive eigenvalues of A α satisfy If we choose cα λ + k (A α) c e Ck + α for k = 1,,.... we thus obtain our main result: b = max{ b, C}, a = ã, Theorem 5.1. The spectrum of A α satisfies (30) sp(a α ) [ b, a] [cα, α] {λ 1, λ,..., λ N(α) } [a, b], where α < λ i < a for i = 1,,..., N(α), ln( c) ln(α) N(α) = O(ln(α C 1 )), provided that α (0, 1]. The constants a, b, c, c, C > 0 do not depend on α.

13 ANALYSIS OF THE MINIMAL RESIDUAL METHOD 13 Remark. For the sake of convenience we choose a, which is the number involved in defining the upper bound for the negative eigenvalues, as the left end point of the positive interval [a, b]. This choice plays no important role in the convergence analysis that will be presented in Section 7. One may define this left end point to be any fixed positive number < b. The crucial observation is that the number of eigenvalues that are larger than α and less than a is of order O(ln(α 1 )). Krylov subspace solvers are known to handle problems with spectra on the form (30) very well. This will be illuminated by numerical experiments in the next section. The rigorous convergence analysis of the Minimal Residual Method, which is based on (30) and Chebyshev polynomials, is rather standard and very technical. We will return to this issue after the presentation of the examples. 6. Numerical experiments Example 1. Let Ω = (0, 1) (0, 1) denote the unit square with boundary Ω and consider the following optimization task: { 1 (31) min v L (Ω), u H 1 (Ω) T u d L ( Ω) + 1 } α v L (Ω) subject to (3) (33) In this case u + u = { v in D = (0.5, 0.75) (0.5, 0.75), 0 in Ω \ D, u n = 0 on Ω. H 1 = L (Ω), H = H 1 (Ω), H 3 = L ( Ω). Furthermore, the observation operator T is simply the L -trace of u H 1 (Ω): T : H 1 (Ω) L ( Ω), u u Ω. Note that the weak form of the state equation (3)-(33) reads: (34) Âu = Bv, where Â : H 1 (Ω) (H 1 (Ω)), B : L (Ω) (H 1 (Ω)), u (u, φ) H 1 (Ω) φ H 1 (Ω), v (v, φ) L (D) φ H 1 (Ω) and (H 1 (Ω)) denotes the dual space of (H 1 (Ω)). In the previous sections we assumed that the operator on the left hand side of the state equation () is a mapping from the state space onto the state space. Note that Â does not fulfill this criterion. This can be fixed by invoking the Riesz map R of H 1 (Ω): R : H 1 (Ω) (H 1 (Ω)).

14 14 BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL More precisely, by applying R 1 to both sides of (34) we get R 1 Âu = R 1 Bv, which is on the desired form () since A = R 1 Â : H 1 (Ω) H 1 (Ω), B = R 1 B : L (Ω) H 1 (Ω). In most practical situations it is inconvenient, due to computational demands, to use the inverse Riesz map. Instead one may employ an approximation of R 1 defined by, e.g., multigrid cycles. Another way to put it, is that the multigrid cycle is a Riesz map in a norm equivalent space. We will refer to R 1 as a preconditioner. Please note that: The iteration counts presented below were produced with an approximation of R 1 consisting of two sweeps of algebraic multgrid with SSOR as smoother, see [5] for a description of the software framework cbc.block, built on top of FEniCS and PyTrilinos. All figures in this section showing eigenvalue distributions of A α were generated with the true inverse Riesz map R 1. The optimality system was discretized with the standard Finite Element Method using piecewise linear basis functions. More specifically, the domain Ω was divided in N N squares and each of these squares were split into two triangles. Table 1 shows that the number of minimal residual iterations needed to solve A α p = b does not increase significantly as α decreases or as N increases. These results were generated with a true solution p = 0 and a random initial guess p 0. The iteration process was stopped as soon as r k r 0 = ( (Aα p k b, A α p k b) (A α p 0 b, A α p 0 b) ) 1/ < 10 4, where r k = A α p k b is the residual, and p k is the kth approximation of p generated by the algorithm. N\α Table 1. Number of minimal residual iterations required to solve the model problem studied in Example 1.

15 ANALYSIS OF THE MINIMAL RESIDUAL METHOD 15 Figure 1 shows the eigenvalues of A α sorted in increasing order. The three intervals used to characterize the spectrum of A α in (30) in Theorem 5.1 are clearly visible. It is not possible to see the isolated eigenvalues in this figure, but a manual inspection of the numerically computed spectrum reveals that there are a handful eigenvalues larger than α and less than Figure 1. Eigenvalues of A α, sorted in increasing order, for the model problem studied in Example 1. These numbers were generated with α = and N = 3. In our analysis we assumed that A 0 inherits the ill posed nature of the optimization problem (1)-(), see assumption A6. According to Figure this is indeed the case for the present model problem. Moreover, the right panel of Figure indicates that A 0 satisfies assumption A6, i.e. indicates that the problem is severely ill posed. Example. Example is identical to Example 1, but we introduce a variable coefficient in the state equation: { v in D = (0.5, 0.75) (0.5, 0.75), (k u) + u = 0 in Ω \ D, k(x, y) = + sin(π(x + y)), (x, y) (0, 1). Table contains the iteration counts for this model problem. Concerning the influence of α and N, we see that the conclusion reached in Example 1 is still valid, but more iterations are required due to the variable coefficient. Information about the eigenvalue distributions of A α and A 0 can be found in figures 3 and 4, respectively. The change in the spectrum caused by the variable coefficient is clearly visible in Figure 3, which should be compared with Figure 1. This observation is consistent with the numbers presented in

16 16 BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL Figure. The left panel shows the logarithm of the absolute value of the eigenvalues of A 0 sorted in decreasing order. These are results obtained in Example 1 with N = 3. In the right panel we have zoomed in on the interval (179,367). N\α Table. Number of minimal residual iterations required to solve the model problem studied in Example. tables and 1. We would like to emphasize that the same preconditioner is employed in examples 1 and. By inspecting figures 4 and we conclude that the qualitative properties of the small eigenvalues, in absolute sense, of A 0 is not significantly influenced by the coefficient functions k. In fact, the right panel in Figure 4 indicates that the present problem is severely ill posed. Example 3. Our third test case is the standard test problem of the PDE constrained optimization community: { 1 (35) min v L (Ω), u H 1 (Ω) T u d L (Ω) + 1 } α v L (Ω)

17 ANALYSIS OF THE MINIMAL RESIDUAL METHOD Figure 3. Eigenvalues of A α, sorted in increasing order, for the model problem studied in Example. These numbers were generated with α = and N = Figure 4. The left panel shows the logarithm of the absolute value of the eigenvalues of A 0 sorted in decreasing order. These are results obtained in Example with N = 3. In the right panel we have zoomed in on the interval (300,367). subject to (36) (37) u + u = v in Ω, u n = 0 on Ω.

18 18 BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL We observe that H 1 = L (Ω), H = H 1 (Ω), H 3 = L (Ω) and that the observation operator T is the imbedding T : H 1 (Ω) L (Ω), u u. Table 3 shows that the Minimal Residual Method also solves this problem efficiently. More specifically, there is no significant increase in the number of iterations needed as α decreases or N increases. N\α Table 3. Number of minimal residual iterations required to solve the model problem studied in Example 3. The eigenvalues associated with our third model problem is depicted in Figure 5. Again, we observe that the spectrum mainly consists of three bounded intervals. The isolated eigenvalues can not been seen, but were detected by a manual inspection of the computations. Figure 6 contains log-log plots of the absolute value of the eigenvalues of A 0 sorted in decreasing order. Note that the right panel almost shows a line and that this is log-log plot. This indicates that (35)-(37) is mildly ill posed: For a mildly ill posed problem the eigenvalues of A 0 would be expected to obey (38) λ i (A 0 ) C i c for i = 1,,..., and hence ln( λ i (A 0 ) ) ln(c) c ln(i). In the theoretical considerations presented in Section 5 we assumed that the inverse problem (1)-() is severely ill posed and that this property is inherited by A 0 in the sense of assumption A6. If instead (1)-() is mildly ill posed, then one would expect (38) to hold. The analysis presented in the previous sections can easily be modified to also include such cases. The only significant change concerns the number N(α) of isolated eigenvalues in Theorem 5.1: (Cα 1 N(α) ) 1/c = O(α 1/c ). We conclude, at least from an asymptotic point of view, that severely ill posed problems will have fewer isolated eigenvalues than mildly ill posed problems.

19 ANALYSIS OF THE MINIMAL RESIDUAL METHOD Figure 5. Eigenvalues of A α, sorted in increasing order, for the model problem studied in Example 3. These numbers were generated with α = and N = Figure 6. The left panel shows a log-log plot of the absolute value of the eigenvalues of A 0 sorted in decreasing order. These are results obtained in Example 3 with N = 3. In the right panel we have zoomed in on the interval ( , ). 7. Convergence analysis We know that the spectrum of A α satisfies (39) sp(a α ) [ b, a] [cα, α] {λ 1, λ,..., λ N(α) } [a, b].

20 0 BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL As will become evident below, the convergence analysis based on (39) and Chebyshev polynomial techniques gets very involved. Such arguments are much simpler for systems with spectra contained in one bounded interval and with a finite number of isolated eigenvalues outside this interval. We will now briefly discuss why the spectrum of A α also is of the latter kind and use this observation to study the performance of the Minimal Residual Method. An analysis based on (39) is presented at the end of this section. From (9) we find that Remember that we assume that λ k (A α) λ k (A 0). λ k (A 0 ) c e Ck and consequently, by a subsequence argument, λ k (A α) λ k (A 0) c e Ck for k = 1,,..., for k = 1,,.... On the other hand, (39) assures that 0 < a λ k (A α) b and we conclude that 0 < a λ k (A α) c e Ck for k = 1,,.... This means that A α only can have a finite number n of negative eigenvalues, and that n is bounded independently of α. Furthermore, (8) implies that the number M(α) of eigenvalues that are larger than α must be of order O(ln(α 1 )). We conclude that: Theorem 7.1. The spectrum of A α satisfies (40) sp(a α ) {λ 1, λ,..., λ n } [cα, α] {λ + 1, λ+,..., λ+ M(α) }, λ i a for i = 1,,..., n, λ + i > α for i = 1,,..., M(α), ln( c) ln(α) M(α) = O(ln(α C 1 )), provided that α (0, 1]. The constants a, c, c, C, n > 0 do not depend on α. It is important to note that this theorem states that the number of eigenvalues outside [cα, α] only grows logarithmically as α 0. We will now show how this result provides a rather straightforward analysis of the asymptotic behavior of the Minimal Residual Method applied to (41) A α p = b as α 0. The convergence properties of Krylov subspace solvers applied to systems with isolated eigenvalues have been studied by a number of scientists. In this text we basically combine techniques presented in Axelsson and Lindskog [5], Axelsson [3] and Hackbusch [19]. We begin by exploring the consequences of Theorem 7.1. Recall the distribution (40) of the eigenvalues of A α : sp(a α ) {λ 1, λ,..., λ n } [cα, α] {λ + 1, λ+,..., λ+ M(α) }.

21 ANALYSIS OF THE MINIMAL RESIDUAL METHOD 1 For the sake of simplicity, we introduce the notation λ i = λ i for i = 1,,..., n, λ n+i = λ + i for i = 1,,..., M(α). The proof of the next theorem is based on the structure of the set consisting of the squares of the eigenvalues of A α. More precisely, note that (4) λ sp(a α ) λ [c α, 4α ] { λ 1, λ,..., λ n+m(α) }, (43) 4α < λ for i = 1,,..., n + M(α), i provided that α (0, α ], where α = min{1, a/} (which is needed to guarantee that (43) holds). The property expressed in (4) imply that we can adapt the argument presented on pages in [5] in a rather straightforward manner to prove the following theorem: Theorem 7.. Let p denote the solution of (41), let c be the constant in (40) and let ɛ > 0 be a given error tolerance. If k ( ) (44) c ln n + M(α), ɛ then p k p ( p 0 p = (Aα (p k p ), A α (p k p ) )) 1/ (A α (p 0 p ), A α (p 0 p ɛ, )) where p k is the kth approximation of p generated by the Minimal Residual Method applied to (41). The constant n > 0 does not depend on the regularization parameter α (0, α ] and M(α) is of order O(ln(α 1 )): ln( c) ln(α) M(α) = O(ln(α C 1 )). Here, where a > 0 is the constant in (39). α = min{1, a/}, Proof. According to Elman, Silvester and Wathen [14], page 306, (45) p k p p 0 p min Φ k Π k max Φ k(λ), λ sp(a α) where Π k is the set of all polynomials of degree k with Φ k (0) = 1. Let l = k/ 1, and consider the polynomial where Φ l q/ (x; c α, 4α ) = ( ) T 4α +c α x l q/ 4α c α (46) q = q(α) = n + M(α) T l q/ ( 4α +c α 4α c α ),

22 BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL is two times the number of isolated eigenvalues in (40), and T l q/ is the Chebyshev polynomial of order l q/. It is well known that, see e.g. Axelsson and Lindskog [5] and references therein, η l q/ (47) max x [c α,4α ] Φ l q/ (x; c α, 4α ) = 1 + η l q/ ηl q/, where η = c α 4α c α 4α (48) is independent of α. Please observe that = 1 c 1 + c (0, 1) λ [cα, α] λ [c α, 4α ] and that Φ l q/ (λ ; c α, 4α ) is of degree l q/ k q. Consequently, recall that q is two times the number of isolated eigenvalues, the polynomial ( ) q/ Ψ k (λ) = 1 λ Φ λ l q/ (λ ; c α, 4α ), i Ψ k (0) = 1 is in Π k and satisfies i=1 Ψ k ( λ i ) = 0 Inequalities (43) imply that 1 λ 1 λ i for i = 1,,..., q/. for all λ [cα, α], and from (47) we find that max Ψ k(λ) ( max λ sp(a α) λ q/ i=1 λ i ) [cα,α] Ψ k (λ) = max Ψ k(λ) λ [cα,α] q/ max 1 λ max λ [cα,α] Φ l q/ (λ ; c α, 4α ) λ [cα,α] i=1 η l q/ = η k/ 1 q/. λ i Clearly, η k/ 1 q/ ɛ

23 ANALYSIS OF THE MINIMAL RESIDUAL METHOD 3 if k ln ( ) ɛ ln(η 1 ) + q + 4 = ln ( ) ɛ ln(η 1 + n + M(α) + 4 ) ) ln ( ɛ = ln ( 1+c/ 1 c/ ) + n + M(α) + 4, see (46) and (48). Since Ψ k Π k the theorem is now a consequence of (45). This theorem shows that the number of minimal residual iterations needed to solve (41) cannot grow faster than O(ln(α 1 )) as α 0. Nevertheless, this result must be regarded as an asymptotic property because the lower bound (44) for the number of iterations required depends on the total number n + M(α) of eigenvalues outside [cα, α]. In most practical situations there will be rather many eigenvalues not belonging to [cα, α], and a relative error reduction of size ɛ will be obtained in far less than n + M(α) iterations. We now turn our attention to the characterization of the spectrum of A α provided by Theorem 5.1: sp(a α ) [ b, a] [cα, α] {λ 1, λ,..., λ N(α) } [a, b]. The squares of the eigenvalues must therefore satisfy λ [c α, 4α ] {λ 1, λ,..., λ N(α) } [a, b ], 4α < λ i < a for i = 1,,..., N(α). An analysis of the CG method applied to positive definite systems with spectra contained in two intervals is presented on pages 19-1 in Axelsson [3]. It is possible to adapt Axelsson s argument to the present situation. The main challenge is to incorporate the effect of the isolated eigenvalues. Let ( ) T 4α +c α x m Φ m(x; c α, 4α 4α ) = c α ( ), T 4α +c α m 4α c α P N(α) (x) = N(α) i=1 Φ l q/ (x; a, b ) = q = m + N(α), l = k/ 1, ( 1 x λ i ), ( ) T b +a x l q/ b a T l q/ ( b +a b a ), where k is the number of minimal residual iterations and m is a positive integer that will be specified below. We suggest to employ the following

24 4 BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL polynomial in the convergence analysis (49) Ψ k (λ) = Φ m(λ ; c α, 4α ) P N(α) (λ ) Φ l q/ (λ ; a, b ), which has degree m + N(α) + l q m + N(α) + k m N(α) = k and Ψ k (0) = 1, i.e. Ψ k Π k. Our goal is to determine a suitable upper bound for Ψ k (λ) for λ [ b, a] [cα, α] {λ 1, λ,..., λ N(α) } [a, b]. This can be accomplished as follows: Assume that λ {λ 1, λ,..., λ N(α) } : Then P N(α) (λ ) = 0, which implies that Ψ k (λ) = 0. Assume that λ [cα, α] λ [c α, 4α ] : We treat each of the three factors in (49) separately: Because λ 4α < λ i, we find that 1 λ < 1 P N(α)(λ ) < 1. λ i Since λ < a < b, it follows that 1 < b + a λ b a < b + a b a, and by well known properties of Chebyshev polynomials ( b T + a λ ) ( l q/ b a < b T + a ) l q/ b a Φ l q/ (λ ; a, b ) < 1. As briefly discussed in connection with (47), ( 1 max λ [cα,α] Φ m(λ ; c α, 4α c ) m ) 1 + c. It therefore is evident that ( 1 c ) m Ψ k (λ) 1 + c. Note that provided that (50) m = Ψ k (λ) ɛ, ( ) 1 c ln, ɛ and we have specified the integer m. Assume that λ [ b, a] [a, b] λ [a, b ] : The three factors of Ψ k, see (49), are first analyzed individually:

25 ANALYSIS OF THE MINIMAL RESIDUAL METHOD 5 Note that 1 λ λ i = λ i λ b 4α, because 4α < λ i < λ b. Consequently, ( ) b P N(α) (λ N(α) ) 4α. Similarly to (47), Φ l q/ (λ ; a, b ) 1 a b a 1 + b ( 1 a ) l q/ b = 1 + a. b The treatment of Φ m(λ ; c α, 4α ) = λ i l q/ ( ) T 4α +c α λ m 4α c α ( ) T 4α +c α m 4α c α is more involved. Since λ > 4α > cα, we find that 4α + c α λ 4α c α > 1. Chebyschev polynomials are known to satisfy T m (y) y m for y > 1, see e.g. page 0 in [3], and we therefore conclude that ( 4α T + c α λ ) m 4α c α 4α + c α λ m 4α c α 4α + c α b m (4 c )α ( 4b ) m (4 c )α. If this bound is combined with the inequality ( 1 ( 1 c ) m T 4α +c α m 1 + c, 4α c α ) see page 13 in [3], then we can conclude that ( 1 Φ m(λ ; c α, 4α c ) m ( 4b ) m ) 1 + c (4 c )α. We thus find that ( 1 c ) m ( Ψ k (λ) c 4b (4 c )α ) m ( 1 a ) l q/ ( ) b b N(α) 1 + a b 4α,

26 6 BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL where l = k/ 1, q = m + N(α). If m is defined as in (50), then ( 1 c ) m 1 + c ɛ and ( Ψ k (λ) ɛ Consequently, for 4b (4 c )α ) m ( 1 a ) l q/ ( ) b b N(α) 1 + a b 4α. Ψ k (λ) ɛ k b [ ( 4b m ln a (4 c )α +m + N(α) +, m = 1 c ln ( ɛ ). ) ( ) ] b + N(α) ln 4α + ln() Before we formulate our last convergence result, please recall the basic structure of the spectrum of A α : sp(a α ) [ b, a] [cα, α] {λ 1, λ,..., λ N(α) } [a, b]. Theorem 7.3. Let p denote the solution of (41) and let ɛ > 0 be a given error tolerance. If (51) k b [ ( 4b ) ( ) ] b m ln a (4 c )α + N(α) ln 4α + ln() + m + N(α) +, ( ) 1 m = c ln, ɛ then provided that p k p p 0 p = ( (Aα (p k p ), A α (p k p ) )) 1/ (A α (p 0 p ), A α (p 0 p ɛ, )) α (0, α ], α = min{1, a/}. Here, p k is the kth approximation of p generated by the Minimal Residual Method applied to (41), and N(α) is of order O(ln(α 1 )): ln( c) ln(α) N(α) = O(ln(α C 1 )).

27 ANALYSIS OF THE MINIMAL RESIDUAL METHOD 7 This result shows that the required number of minimal residual iterations cannot grow faster that O((ln(α)) ) as α 0. One may therefore argue that the convergence behavior expressed in Theorem 7. is stronger - at least from an asymptotic point of view. On the other hand, the lower bound for k in Theorem 7. involves the total number of eigenvalues outside [cα, α], which is not the case in (51). In fact, (51) only depends on the number N(α) of (isolated) eigenvalues between α and a, see Theorem 5.1. Acknowledgments. We would like to thank Prof. Herbert Egger for his excellent talk at the Applied Inverse Problems conference in Vienna in 009, which inspired the work presented in this paper. Appendix A. Optimality system We briefly explain how to derive the optimality system (6) associated with (1)-(). The Lagrangian L α for (1)-() reads L α (v, u, w) = 1 T u d H α v H 1 with v H 1 and u, w H. Note that +(Au, w) H + (Bv, w) H L α v, φ = α(v, φ) H 1 + (Bφ, w) H for φ H 1, L α u, φ = (T u d, T φ) H 3 + (Aφ, w) H for φ H, L α w, φ = (Au, φ) H + (Bv, φ) H for φ H, and the first order necessary optimality conditions yield the system (5) (53) (54) L α v = 0, L α u = 0, L α w = 0 α(v, φ) H1 + (Bφ, w) H = 0 for all φ H 1, (T u, T φ) H3 + (Aφ, w) H = (d, T φ) H3 for all φ H, (Au, φ) H + (Bv, φ) H = 0 for all φ H. Equations (5)-(54) can of course be written in the form (6). Appendix B. Boundedness and Babuška-Brezzi conditions Our goal is to derive upper bounds for A α and A 1 α. To this end, let us introduce the notation X = H 1 H, x = (x 1, x ) = x 1 + x, Y = H, [ ] αi 0 M α = 0 T : X X, T N = [ B A ] : X Y, [ ] 0 f = T. d,

28 8 BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL Then we can write (6) in the form: Find x = (v, u) X and y = w Y such that M α x + N y = f, Nx = 0. This is a saddle point problem, and we can use standard techniques to analyze it, see e.g. [10]. Note that, for any x = (x 1, x ) X, z = (z 1, z ) X and α [0, 1], (M α x, z) α (x 1, z 1 ) + (T T x, z ) = α (x 1, z 1 ) + (T x, T z ) x 1 z 1 + T x T z x 1 z 1 + T x z x z + T x z = ( 1 + T ) x z. Also, (Nx, y) (Bx 1, y) + (Ax, y) B x 1 y + A x y B x y + A x y = ( B + A ) x y for any x = (x 1, x ) X and y Y. Both M α and N are thus bounded, and we conclude that A α C for all α [0, 1]. The coercivity of M α on the kernel of N involves the size of the regularization parameter α. More specifically, if z = (z 1, z ) X = H 1 H is such that Nz = 0, i.e. then (5) implies that Consequently, Az = Bz 1, z C z 1. (M α z, z) = α(z 1, z 1 ) + (T T z, z ) = α z 1 + (T z, T z ) α z 1 0.5α z C α z α c z. The inequalities presented in this appendix, assumption A5 and standard theory for saddle point problems [10] assert that A α is continuously

29 ANALYSIS OF THE MINIMAL RESIDUAL METHOD 9 invertible and that A 1 α 1 cα for all α (0, 1]. References [1] S. S. Adavani and G. Biros. Multigrid algorithms for inverse problems with linear parabolic PDE constraints. SIAM Journal on Scientific Computing, 31(1): , 008. [] D. N. Arnold, R. S. Falk, and R. Winther. Preconditioning discrete approximations of the Reissner-Mindlin plate model. Mathematical Modeling and Numerical Analysis, 31(4): , [3] O. Axelsson. Solution of linear systems of equations: iterative methods. In A. Dold and B. Eckmann, editors, Sparse Matrix Techniques, Copenhagen 1976, number 57 in Lecture Notes in Mathematics, chapter 1, pages Springer-Verlag, [4] O. Axelsson. Iterative Solution Methods. Cambridge University Press, [5] O. Axelsson and G. Lindskog. On the rate of convergence of the preconditioned conjugate gradient method. Numerische Mathematik, 48(5):499 53, [6] M. Benzi, G. H. Golub, and J. Liesen. Numerical solution of saddle point problems. Acta Numerica, 14:1 137, 005. [7] A. Borzì, K. Kunisch, and Do Y. Kwak. Accuracy and convergence properties of the finite difference multigrid solution of an optimal control optimality system. SIAM Journal on Control and Optimization, 41(5): , 003. [8] A. Borzì and V. Schulz. Multigrid methods for PDE optimization. SIAM review, 51(): , 009. [9] A. Borzì and V. Schulz. Computational Optimization of Systems Governed by Partial Differential Equations. SIAM, 01. [10] D. Braess. Finite elements. Theory, Fast Solvers, and Applications in Solid Mechanics. Cambridge University Press, second edition, 001. [11] H. S. Dollar, N. I. M. Gould, M. Stoll, and A. Wathen. Preconditioning saddle-point systems with applications in optimization. SIAM Journal on Scientific Computing, 3(1):49 70, 010. [1] H. Egger. Preconditioning CGNE-iterations for inverse problems. Numerical Linear Algebra with Applications, 14(3): , 007. [13] H. Egger and A. Neubauer. Preconditioning Landweber iteration in Hilbert scales. Numerische Mathematik, 101(4):643 66, 005. [14] H. Elman, D. Silvester, and A. Wathen. Finite Elements and Fast Iterative Solvers: with Applications in Incompressible Fluid Dynamics. Oxford University Press, 005. [15] H. C. Elman. Preconditioners for saddle point problems arising in computational fluid dynamics. Applied Numerical Mathematics, 43(1 ):75 89, 00. [16] H. C. Elman, D. J. Silvester, and A. Wathen. Block preconditioners for the discrete incompressible Navier-Stokes equations. International Journal for Numerical Methods in Fluids, 40(3 4): , 00. [17] O. Ghattas. PDE-constrained optimization at the 005 SIAM conferences on CS&E and optimization. SIAM News, 38(6), 005. [18] G. H. Golub and F. Van Loan. Matrix Computations. The Johns Hopkins University Press, [19] W. Hackbusch. Iterative Solution of Large Sparse Systems of Equations. Springer- Verlag, [0] P. C. Hansen. Discrete Inverse Problems: Insight and Algorithms. SIAM, 010. [1] E. Haug and R. Winther. A domain embedding preconditioner for the Lagrange multiplier system. Mathematics of Computation, 69(9):65 8, [] M. Heinkenschloss and H. Nguyen. Neumann-neumann domain decomposition preconditioners for linear-quadratic elliptic optimal control problems. SIAM Journal on Scientific Computing, 8(3): , 006. [3] M. Hinze, R. Pinnau, M. Ulbrich, and S. Ulbrich. Optimization with PDE Constraints. Springer, 009.

30 30 BJØRN FREDRIK NIELSEN 1 AND KENT-ANDRE MARDAL [4] B. Hofmann and S. Kindermann. On the degree of ill-posedness for linear problems with non-compact operators. Methods and Applications of Analysis, 17: , 010. [5] K. A. Mardal and J. B. Haga. Block preconditioning of systems of PDEs. In A. Logg, K. A. Mardal, and G. Wells, editors, Automated Solution of Differential Equations by the Finite Element Method, pages Springer, 01. [6] K. A. Mardal and R. Winther. Preconditioning discretizations of systems of partial differential equations. Numerical Linear Algebra with Applications, 18(1):1 40, 011. [7] T. P. Mathew, M. Sarkis, and C. E. Schaerer. Analysis of block matrix preconditioners for elliptic optimal control problems. Numerical Linear Algebra with Applications, 14(4):57 79, 007. [8] Min max theorem. Wikipedia. [9] B. F. Nielsen and K. A. Mardal. Efficient preconditioners for optimality systems arising in connection with inverse problems. SIAM Journal on Control and Optimization, 48(8), 010. [30] C. C. Paige and M. A. Saunders. Solution of sparse indefinite systems of linear equations. SIAM Journal on Numerical Analysis, 1(4):617 69, [31] T. Rees, H. S. Dollar, and A. Wathen. Optimal solvers for PDE-constrained optimization. SIAM Journal on Scientific Computing, 3(1):71 98, 010. [3] T. Rees, M. Stoll, and A. Wathen. All-at-once preconditioning in PDE-constrained optimization. Kybernetika (Prague), 46():341360, 010. [33] T. Rusten and R. Winther. A preconditioned iterative method for saddle point problems. SIAM Journal on Matrix Analysis and Applications, 13(3): , 199. [34] J. Schöberl, R. Simon, and W. Zulehner. A robust multigrid method for elliptic optimal control problems. SIAM Journal on Numerical Analysis, 49(4): , 011. [35] J. Schöberl and W. Zulehner. Symmetric indefinite preconditioners for saddle point problems with applications to PDE-constrained optimization problems. SIAM Journal on Matrix Analysis and Applications, 9(3):75 773, 007. [36] D. Silvester and A. Wathen. Fast iterative solution of stabilized Stokes systems. part II: Using block diagonal preconditioners. SIAM Journal on Numerical Analysis, 31(5): , [37] R. Simon and W. Zulehner. On Schwarz-type smoothers for saddle point problems with applications to PDE-constrained optimization problems. Numerische Mathematik, 111(3): , 009. [38] S. Takacs and W. Zulehner. Convergence analysis of multigrid methods with collective point smoothers for optimal control problems. Computing and Visualization in Science, 14: , 011. [39] A. Wathen. Preconditioning for PDE-constrained optimization. SIAM News, 43(), 010. [40] G. Wittum. Multigrid methods for Stokes and Navier-Stokes equations. Numerische Mathematik, 54(5): , [41] W. Zulehner. Nonstandard norms and robust estimates for saddle point problems. SIAM Journal on Matrix Analysis and Applications, 3: , a) Department of Mathematical Sciences and Technology, Norwegian University of Life Sciences, Norway. b) Simula Research Laboratory. c) Center for Cardiological Innovation, Oslo University Hospital. bjorn.f.nielsen@umb.no.. a) Center for Biomedical Computing, Simula Research Laboratory. b) Department of Informatics, University of Oslo, Norway. kent-and@simula.no

On the interplay between discretization and preconditioning of Krylov subspace methods

On the interplay between discretization and preconditioning of Krylov subspace methods Josef Málek and Zdeněk Strakoš Nečas Center for Mathematical Modeling Charles University in Prague and Czech Academy