Analysis of two-grid methods: The nonnormal case

Analysis of two-grid methods: The nonnormal case Yvan Notay Service de Métrologie Nucléaire Université Libre de Bruxelles (C.P. 65/84) 5, Av. F.D. Roosevelt, B-5 Brussels, Belgium. email : ynotay@ulb.ac.be Report GANMN 8- March 28 Abstract Core results about the algebraic analysis of two-grid methods are extended in relations bounding the field of values (or numerical range) of the iteration matrix. On this basis, bounds are obtained on its norm and numerical radius, leading to rigorous convergence estimates. Numerical illustrations show that the theoretical results deliver qualitatively good predictions, allowing one to anticipate success or failure of the two-grid method. They also indicate that the field of values and the associated numerical radius are much more reliable convergence indicators than the eigenvalue distribution and the associated spectral radius. On this basis, some discussion is developed about the role of local Fourier or local mode analysis for nonsymmetric problems. Key words. Iterative methods, Convergence Analysis, Linear Systems, Multigrid, Two-grid, Nonnormal matrices, AMG, Preconditioning AMS subject classification. 65F8, 65F, 65F5, 65N22 Yvan Notay is Research Director of the Fonds de la Recherche Scientifique FNRS

Introduction Regarding linear systems with nonsymmetric (and, more specifically, nonnormal) system matrices, there is an important gap between the general literature on iterative methods and works focused on multigrid methods. Indeed, in the former, many studies address nonnormality effects (see [, 8, 36], to mention just a few), with the outcome that the eigenvalue distribution may be a completely misleading convergence indicator. Nevertheless, the assessment of mulitigrid methods remains largely dominated by eigenvalue analyses. More precisely, abstract convergence theories (e.g., [9, 29]) are based on norms and thus robust with respect to nonnormality effects. But their assumptions are difficult to check (especially in the nonsymmetric case), whereas the related bounds involve unknown constants, making difficult to use them for the practical assessment of the potentialities of a given multigrid scheme. To this purpose, one then most often uses the spectral radius estimation for a two-grid variant in model but typical situations, for which (rigorous) Fourier Analysis and Local Fourier Analysis (LFA, sometimes referred to as local mode analysis) offer convenient tools (e.g., [2, 4, 5, 35, 37]). For more general problems and methods, like algebraic multigrid methods, algebraic convergence theories have been developed [3, 3, 3, 4], which are more general in that they do not require model problem assumptions. For symmetric and positive definite problems, they allow one to bound the spectral radius of the iteration matrix as a function of more tractable parameters. In [26, 27], we extend this approach to nonsymmetric problems, but the focus is kept on eigenvalues. Some might therefore be surprised that nonnormality effects with two-grid methods can in fact be observed even in quite simple examples. Consider for instance the m 2 m 2 matrix associated with the stencil α 2 + α + β (.) β on a square m m grid with Dirichlet type boundary conditions. For this matrix, consider the basic but standard two-grid method that uses a single pre-smoothing step with damped Jacobi smoothing (no post-smoothing and damping parameter ω = ), together with a 2 coarse grid correction based on h 2 h coarsening, bilinear interpolation for the prolongation, full weighting for the restriction and the Galerkin coarse grid matrices. For α = 5, β = m = 3, on Figure, left, we depict the computed spectrum of the iteration matrix, whereas on Figure, right, we plot the evolution of the relative residual error r k / r as a function of the number of stationary iterations, together with its estimates r k / r ρ k based on the computed spectral radius ρ of the iteration matrix. Clearly, this estimates is here completely misleading. Of course, the spectral radius correctly reflects the asymptotic convergence, which takes place after enough many iterations have been performed, about 7 in the present example. But such a delay is unacceptable for a multilevel method, and it can in fact prevent any convergence when the two-grid scheme is recursively used in a multigrid cycle, whose 2

.5 5 res. norm spec. rad estim. -5.5-2 3 4 Figure : left: spectrum of the iteration matrix ( ); right: relative residual norm as a function of the number of iterations and its estimation based on the spectral radius ρ(t ). convergence heavily relies on the capacity to have significant error reduction within a few iterations [37, Section 3.2]. On the other hand, any multigrid expert would advise to use (line) Gauss Seidel smoothing instead of damped Jacobi in this example, and this would likely solve the observed convergence issue. But one may prefer Jacobi because, e.g., it is straightforwardly parallelizable, and, anyway, independently of this example, some fundamental questions are raised which deserve to be investigated. In fact, a number of works already mention issues associated with the classical eigenvalue analysis of multigrid methods and develop alternative analysis strategies. Among these, the best known is probably Brandt half plane analysis for convection dominated (singularly perturbed) problems, which is based on using a semi-infinite grid instead of the fully infinite one used in LFA [5, 6]. Other works address more specifically parabolic problems, for which nonnormality effects are most obvious; see, e.g., [5, 6, 2]. Now, while interesting, these previous studies focus on some particular applications and the proposed approaches remain heuristic, with the notable exception of [], where rigorous convergence estimates are obtained, which are, however, restricted to a specific parallel-in-time method [2]. One is thus lacking a study that addresses the raised questions from a general viewpoint, while deriving rigorous and widely usable convergence bounds. In the present work, we bring several contributions to such a study. Our main result is a reworking of the tools proposed in [26, 27], which extend to nonsymmetric matrices the algebraic convergence theory of two-grid methods [3, 3, 4, 3]. Among the results in [26, 27], the most easy to deal with concern the case of a single smoothing step with a symmetric and positive definite smoother (as in the above example). They show that the eigenvalues of the iteration matrix are localized in a bounded region of the complex plane described by two easy to assess parameters, which, in fact, are related to a companion symmetric problem. In view of the above discussion, these results seem to loose most 3

of their relevance. Here we show that the considered region does not only contains the eigenvalues of the iteration matrix, but also the whole field of values (or numerical range). And, as such, give rise to rigorous convergence estimates. These results naturally lead to an enlightening discussion of the field of values with respect to the eigenvalues. (Remember that, in the normal case, the field of values is just the convex hull of the eigenvalues, and hence there is no significant difference between both.) Simple numerical experiments suggest that the field of values and the associated numerical radius are not only more rigorous convergence indicators than the eigenvalue distribution and the associated spectral radius; they also offer realistic convergence prediction, and might thus represent the proper generalization of the eigenvalue tools universally used in the symmetric case. Eventually, our results allow us to enrich the debate about the role of LFA for nonsymmetric problems. Several works discuss the failure of LFA for nonnormal matrices; see [5, 6, 28] and [5, Section 7.5]. Opposite to this, satisfactory convergence predictions for parabolic problems are obtained with LFA in [39] and with a closely related approach in [2, 38]. Here we advocate the idea that, to assess LFA, one should check whether it allows one to correctly predict the extent of the field of values of the iteration matrix, regardless the actual location of its eigenvalues. The remainder of this work is organized as follows. In Section 2, we present generic tools some well known and some new to characterize the convergence of stationary iterations for nonsymmetric systems, while stressing the connections with the field of values. In Section 3, we develop our analysis of two-grid iterations. The results of numerical experiments are presented in Section 4, where we also discuss LFA. Concluding remarks are given in Section 5. 2 Convergence measures for stationary iterations We start with a few definitions that clarify notations used throughout this work; in these definitions, we use the standard notations (z, z) χ = (z, χ z) and z χ = (z, z) χ /2 for any vector z and any Hermitian positive definite matrix χ. Definition 2.. Let C be an n n matrix, let χ be a Hermitian positive definite n n matrix, and let S be a given subspace of C n. We define the norm, the numerical range (or field of values), and the numerical radius of C with respect to χ and S by, respectively, C z χ C χ,s = max, (2.) z S\{} z χ W χ,s (C) = { } (z, C z)χ : z S\{} (z, z) χ, (2.2) (z, C z) χ w χ,s (C) = max. (2.3) z S\{} (z, z) χ 4

We further denote with µ χ,s (C) = Re(z, C z) χ min, z S\{} (z, z) χ (2.4) µ χ,s (C) = Re(z, C z) χ min z S\{} (z, z) χ (2.5) the maximal and minimal real part of an element in W χ,s (C). Below, we often use µ χ,s (C ) for nonsingular C. The following lemma states some useful properties of this quantity; the first of these is borrowed from [32] (we recall the proof for the sake of completeness). Lemma 2.. Let C be a nonsingular n n matrix, let χ be a Hermitian positive definite n n matrix, and let S be a given invariant subspace of C. Then, µ χ,s (C ) is positive if and only if µ χ,s (C) is positive, in which case there holds Moreover, for any positive α, µ χ,s (C ) µ χ,s (C) ( C χ,s ) 2 and µ χ,s (C) µ χ,s (C ) ( C χ,s ) 2. (2.6) µ χ,s (C ) α 2 α C I χ,s. (2.7) Proof. One has, for any z S\{} and w = C z (which is also in S\{} because S is assumed an invariant subspace of C), Re(z, C z) χ = Re(w, C w) χ (w, w) χ µ (z, z) χ (C w, C w) χ,s (C) ; χ (C w, C w) χ hence the if statement and the left inequality (2.6). The only if statement and the right inequality (2.6) further follow by permuting the roles of C and C. On the other hand, observe that, for any z S, and letting w = C z, (2 α C I) z χ 2 = 4 α 2 (w, w) χ 4 α Re(w, C w) χ + (z, z) χ ; that is, (z, z) χ (2 α C I) z χ 2 = 4 α ( Re(w, C w) χ α (w, w) χ ). Hence the left hand side of this equality is always nonnegative if and only if the right hand side is always nonnegative, which is precisely what is stated in (2.7). Now we are ready to discuss the convergence of iterative methods. 5

Let A u = b be a linear system to solve, let u is some initial approximation, and consider the stationary iterations, where B is a given (nonsingular) preconditioner: One has, for any m where For m =,,... : { rm = b A u m u m+ = u m + B r m (2.8) r m = T r m = T m r (2.9) T = I A B (2.) is the iteration matrix governing the residual evolution. As is well known, this implies for any Hermitian positive definite matrix χ. Note that r m χ r χ T m χ,c n (2.) ( χ χ ) r m χ r χ r m r ( χ χ ) r m χ r χ. (2.2) Hence we are free to choose a χ-norm that facilitates the analysis as long as it remains well conditioned. On the other hand, sometimes r belongs to a given invariant subspace S of the right preconditioned matrix C = A B = I T. Then, all subsequent residuals r m will be in S as well, and the above inequality can be improved: r m χ r χ T m χ,s. (2.3) All what is needed is thus a proper analysis of T m χ,s. Tools for this are given in the following theorem; note that the case where only (2.) holds (no relevant invariant subspace) is covered letting S = C n. In this theorem, (2.4) and (2.5) are standard results (we give a short proof for the sake of completeness), whereas the others are seemingly new; (2.9) and (2.22) are, however, strongly inspired by closely related results in [32] for minimal residual iterations. Theorem 2.2. Let A and B be nonsingular n n matrices and let χ be a Hermitian positive definite n n matrix. Let C = A B and let T = I C. Let S be an invariant subspace of C and assume that µ χ,s (C) >. One has T m χ,s ( T χ,s ) m (2.4) and T m χ,s 2 (w χ,s (T )) m. (2.5) 6

Moreover, W χ,s (T ) { z C : µ χ,s (C) Re(z) µ χ,s (C) and Im(z) max ( µ χ,s (ıc), µ χ,s ( ıc) ) } (2.6) and W χ,s (T ) { } z C : z 2 µ χ,s (C ) 2 µ χ,s (C ), (2.7) while, letting µ = min ( µ χ,s (C), µ χ,s (C )), w χ,s (T ) ( max ( µ, µ (C))) 2 ( χ,s + max ( µ χ,s (ıc), µ χ,s ( ıc) )) 2. (2.8) If, in addition, µ χ,s (C ) >, one has further 2 T χ,s ( ( 2 µ χ,s (C )) µχ,s (C)) /2. (2.9) In particular, if µ χ,s (C ), there holds W χ,s (T ) { } z C : Re(z) µ χ,s (C) and z 2 2, (2.2) and w χ,s (T ) ( ) 2 ( µ (C) χ,s + max ( µ χ,s (ıc), µ χ,s ( ıc) )) 2 T χ,s (2.2) µ χ,s (C). (2.22) Proof. Inequality (2.4) is obvious, and it further implies (2.5) because of the inequalities T m χ,s 2 w χ,s (T m ) and w χ,s (T m ) (w χ,s (T m )) m [, 7]. The relation (2.6) is straightforward. Regarding (2.7), observe that, by (2.7), C I, and, hence, 2 µ χ,s (C ) 2 µ χ,s (C ) Therefore, since (for any µ) W χ,s (C 2 µ χ,s (C ) I ) { } z C : z 2 µ χ,s (C ) z W (T ) z W (C) z (2 µ) W (C (2 µ) I),. one obtains W χ,s (T ) { z C : z 2 µ χ,s (C ) } 2 µ χ,s (C ) ; 7

i.e., (2.7). Further, observe that, the smallest µ χ,s (C ), the less restrictive is the stated condition for z W. Therefore, we obtain a wider set when exchanging in (2.7) µ χ,s (C ) for a lower bound, justifying (2.2). The inequality (2.8) follows from (2.6) together with the fact that, by (2.7), the real part of any element in W χ,s (T ) should also not be smaller than /µ χ,s (C ) ; hence we can take the best of this latter and of µ χ,s (C) to bound the minimal (negative) value that can take the real part. Next, (2.9) holds because, for any z S, and letting w = C z, T z 2 x = ( ( C)z, (I C)z ) χ = (z, z) χ 2 Re(z, C z) x + (w, w) χ (z, z) χ 2 Re(z, C z) + µ χ,s (C ) Re(w, C w) = (z, z) χ ( ( 2 µ χ,s (C ) ( ) 2 µ µ χ,s (C ) χ,s (C) ) Re(z, C z) ) (z, z) χ. Finally, (2.22) is a straightforward corollary of (2.9), whereas (2.2) follows from (2.8) with µ /µ χ,s (C ). In this theorem is apparently lacking a bound on the numerical range w χ,s (T ) based on (2.7) combined with the bound on the real part in (2.6). In fact, developing the calculation yields w χ,s (T ) ( ( 2 µ χ,s (C )) µχ,s (C) ) /2 under the condition µχ,s (C ) > ; 2 that is, the same result as that obtained via (2.9) and the standard inequality w χ,s (T ) T χ,s. Similarly, trying to bound w χ,s (T ) via (2.2) gives the same bound as that obtained directly from (2.22). Hence the norm based bounds in Theorem 2.2 are closely related to the field of values analysis; they just offer slightly stronger convergence estimates by getting rid of the factor of 2 in (2.5). On the other hand, the condition µ χ,s (C ) to have the strongest results (2.2), (2.22) is seemingly restrictive, but it just requires a proper scaling of the preconditioner. Such a scaling condition is unnecessary if one uses minimal residual iterations instead of stationary iterations, since the former systematically apply (locally) the best scaling factor regarding the residual norm reduction. The corresponding result is then r k+ χ r k χ µχ,s (C) µ χ,s (C ) [32]. Eventually, it is worth discussing (2.2) when C is close to a real symmetric matrix, so that max ( µ χ,s (ıc), µ χ,s ( ıc) ) is negligible. One sees that the upper bound on w χ,s (T ) is then close to the spectral radius ρ of the iteration matrix for the nearby symmetric problem. Hence the convergence estimate via (2.5) is essentially accurate, except for the (unessential) factor of 2. In contrast, using (2.22) for a symmetric problem leads to take ρ as upper bound for ρ. When estimating the number of iterations to reach a given tolerance, this means an overestimation by a factor of 2. 8

3 Convergence of two-grid iterations Two-grid methods alternate smoothing iterations and coarse grid corrections. Smoothing iterations are stationary iterations with a simple preconditioner, such as Gauss Seidel or damped Jacobi. The corresponding iteration matrix for the residual evolution is T s = I A M, where M is the used preconditioner; e.g., M = ω diag(a) in case of damped Jacobi. The coarse grid correction is based on solving a coarse representation of the problem with a reduced number of unknowns n c < n. It thus involves: a restriction matrix R, an n c n matrix that provides the coarse grid residual from the fine grid one; a coarse grid matrix A c, an n c n c matrix that defines the coarse representation of the fine grid matrix; and a prolongation matrix P, a n n c matrix that extend on the fine grid the correction computed on the coarse grid. In a two-grid scheme (where the coarse problem is solved exactly, as considered in this work), the residual is multiplied by T c = I A P A c R each time one applies a coarse grid correction. Our analysis requires some assumptions that restrict the class of methods taken into consideration. Firstly, we assume the the restriction is the transpose, or, more generally (leaving the door open to complex valued prolongations [23]), the conjugate transpose of the prolongation: R = P H. This is satisfied with many geometric and algebraic multigrid methods, but it is worth noting several works that advocate to break this rule for nonsymmetric problems [7, 8, 9, 23, 33]. Secondly, we assume that one uses the Galerkin coarse grid matrix; i.e., that A c = R A P (= P H A P ). This choice is nearly universal with algebraic multigrid methods, but, with geometric multigrid ones, rediscretization on the coarse grid is often preferred for practical reasons. Finally, while some limited results are stated for general smoothers M, the most interesting ones require that M is Hermitian positive definite, which, in practice, for nonsymmetric problems, amounts to restrict the analysis to damped Jacobi smoothing. Moreover, only a single smoothing step is allowed. This is away from the usual practice, but we believe that this provides a right way to assess more complex methods, at least from a qualitative viewpoint. In particular, algebraic multigrid methods are in principle designed to work well with simple smoothers, and one single step with Jacobi may be seen as the simplest possible smoothing scheme. It is also worth noting that, as commented in [27, Section 4, Remark 5], the classical algebraic multigrid theory for symmetric matrices is, to a large extend, an analysis that resorts to the same assumptions, supplemented with results showing that the convergence does not deteriorate when using more sophisticated smoothing schemes (while not being able to really predict the resulting improvement). From that point of view, the main difference between our analysis below and this classical theory lies in the lack of such complementary results for nonsymmetric matrices, making the use of supposed better smoothing schemes yet more heuristic. 9

Going back to the residual evolution, alternating smoothing iterations and coarse grid corrections implies that, for a complete two-grid step, one has with (assuming thus R = P H ) r m = T m r (3.) T = (I A M ) ν 2 (I A P A c P H ) (I A M ) ν, where ν and ν 2 are, the number of pre- and post-smoothing steps, respectively. Let then where the matrix X is such that One may decompose (3.) in T init = (I A P A c P H )(I A M ) ν, (3.2) T = (I A P A c P H )(I A M ) ν +ν 2 = (I A P A c P H )(I A X ), (3.3) T final = (I A M ) ν 2, (3.4) (I A X ) = (I A M ) ν +ν 2. r = T init r, r m = T m r, r m = T final r m. Clearly, the crucial factor regarding the convergence is the matrix T (3.3). Regarding T init and T final, it suffices that their norm is not much larger than one, which is generally not an issue. A further remark here is that P H T init = when A c = P H A P ; that is, when using the Galerkin coarse grid matrix. Then, r P. Further, P is an invariant subspace of T and therefore of the right preconditioned matrix C = I T that is implicitly defined by this process. It thus follows that two-grid iterations can be analyzed via the assessment of T m χ,p, for which one can use the tools developed in Theorem 2.2. This latter involve quantities for which we prove specific bounds in the following theorem. Observe that the used norm is defined with χ = M, which is typically well conditioned as needed in view of (2.2). All assumptions used in the theorem are discussed above, except the nonsingularity of X c = P H X P. As shown in [27], this is in fact the necessary and sufficient condition for the implicitly defined preconditioner being nonsingular. This assumptions always holds in case of a single smoothing step (ν + ν 2 =, implying X = M) with a Hermitian positive definite smoother. Theorem 3.. Let A and X be a nonsingular n n matrices, and let χ be a Hermitian positive definite n n matrix. Let P be an n n c matrix of rank n c < n and such that A c = P H A P and X c = P H X P are nonsingular. Let T be the iteration matrix defined by (3.3), and let C = I T be the corresponding right preconditioned system matrix.

There holds, for any v P, C v = A ( I P A c P H A ) X v (3.5) and C v = X ( I P X c P H X ) A v. (3.6) If, in addition, ν + ν 2 = and X = M is Hermitian positive definite, let A H = (A + 2 AH ) be the Hermitian part of A, and let T AH = ( I A H P (P H A H P ) P H) ( I A H M ) (3.7) be the iteration matrix for the corresponding two-grid method. One has, letting M c = P H M P, µ M,P (C) = min z (A H P ) \{} ( Re(z, A z) (z, M(I P Mc P H M)z) ( ( λ max (A + 2 AH ) ) ( M I P M c P H M ) )) (3.8) (3.9) = ρ(t AH ), (3.) µ M,P (C ) = µ M,P (M A ) (3.) µ M (M A ) (3.2),C ( n = λ min M /2( A + A H) ) M /2, (3.3) 2 and both these quantities are positive if A H is positive definite. Proof. By (3.3), has C = I T = A P A c P H + A ( I P A c P H A ) X, from which one straightforwardly deduces (3.5), whereas (3.6) holds because, as shown in [27], such C can be written C = A B with B = X X P Xc P H (X A). Next, observe that P H A w = implies w R(P ), because otherwise one would have P H A P y = A c y = for some y. Moreover, P being an invariant subspace of C, v P z = C v P. Then, using (3.6) with X = M, one obtains µ M (C),P = Re(v, C v) M min v P \{} (v, v) M = Re(C z, M z) min z P \{} (C z, M C z) = min z P \{} Re ( M ( I P Mc P H M ) A z, M z ) P H M) A z, (I P Mc P H M) A z) (M (I P M c

Re (A z, z) = min z P \{} (A z, M (I P Mc P H M) A z) Re (w, A w) = min w (A H P ) \{} (w, M (I P Mc P H M) w) Re (w, A w) min w R(P ) (w, M (I P Mc P H M) w) = ( ( ( λ max (A + 2 AH ) ) ( M I P M c P H M ) )). Thus, (3.8) and (3.9) are proved whereas (3.) further follows from [26, Theorem 2.]. On the other hand, using again (3.6) with X = M, µ M,P (C ) = min z P \{} = min z P \{} = min z P \{} min z C n \{} proving thus (3.), (3.2) and (3.3). Re(z, C z) M (z, z) M Re ( z, ( I P Mc P H M ) A z ) (z, M z) Re(z, A z) (z, M z) Re(z, A z) (z, M z) = λ min ( 2 M /2( A + A H) M /2 ), Combined, (2.6), (2.7), (3.) and (3.3) provide bounds on the field of values which, in fact, coincide with those obtained in [26, 27] for just the eigenvalues. While these former results are heuristic (or even debatable) regarding the convergence of the iterative method, now rigorous bounds are obtained via (2.5) and (2.8) or (2.2), or via (2.4) and (2.9) or (2.22). Moreover, in many cases it should not be too difficult to assess the two parameters involved; that is, the right and sides of (3.9)/(3.) and (3.2)/(3.3). Regarding (3.9)/(3.), one is left with the standard analysis of the spectral radius of the iteration matrix associated with a companion symmetric problem. Hence, all tools for the symmetric case can be used here, including (rigorous) Fourier analysis [37] and other results more adapted to algebraic multigrid methods [24, 3]. Regarding (3.2)/(3.3), we first note that µ M (M A ) varies linearly with the,c n scaling of M. Hence, the condition µ χ,s (C ) that is useful when applying Theorem 2.2 can be enforced with a proper scaling; that is, with a proper selection of ( the damping parameter ω in case one uses Jacobi. That said, as pointed out in [27], λ min M /2 (A + 2 A H )M /2) ( is in general smaller than λ min M /2 (A H ) M /2), which can be easier to assess. It is then useful to recall the following result from [25], which, combined with Lemma 2., leads directly to a bound on µ M (M A ) when M = ω diag(a).,c n 2

.5 5 res. norm spec. rad estim. num. rad. estim. -5.5-2 3 4 Figure 2: left: spectrum of the iteration matrix T ( ), with the boundary of W M,P (T ) (- -), and bound on this latter based on (2.2), (3.) (----); right: relative residual norm as a function of the number of iterations, and the two estimations based, respectively, on the spectral radius ρ(t ) and on the numerical radius w M,P (T ). Theorem 3.2. Let A be a nonsingular M-matrix with row and column sums both nonnegative, and let D = diag(a) and M = ω diag(a) for some positive parameter ω. One has and I D /2 A D /2 2 = I A D D (3.4) µ M,C n (M A ) (2 w). (3.5) Proof. The inequality (3.4) is proved in [25, Lemma 3.], whereas (3.5) follows then from (2.7) and Definition 2., which implies µ M,S (M A ) = ω µ D,S (D A ) when M = ω D. Hence, when the stated assumptions hold, it suffice to select ω = to guaranty that 2 the scaling condition µ χ,s (C ) is met. This is precisely what happens in the example considered in the Introduction (see Figure ), in which the system matrix is an M-matrix with both row and column sums nonnegative, while we used damped Jacobi with ω =. 2 Now, it is interesting to reconsider this example at the light of the above results; that is, besides the eigenvalues, to draw as well the field of values and the bound on this latter obtained with (2.2) and (3.). This is done on Figure 2, left, where one clearly sees that the field of values spreads significantly away from the convex hull of the eigenvalues, while being nicely estimated by the theoretical results. Without surprise, one sees then on Figure 2, right, that, compared with the spectral radius, the numerical radius offers a convergence estimate that is not only more rigorous, but also more accurate. 3

4 LFA and numerical examples LFA or local mode analysis [2, 5, 37] is used to asses geometric two-grid methods applied to models problems with constant coefficients on a regular grid. The actual grid is extended to an infinite grids so as to get rid of the boundary conditions that prevent the matrix from being diagonal in the Fourier basis. Nearly equivalently, one may stay with a finite grid but modify the true boundary conditions for periodic ones. When the matrix is already diagonal in the Fourier basis with the actual boundary conditions, LFA gives often the same results as rigorous Fourier Analysis (see [3] for a thorough analysis of this point). If not (as is always the case for nonsymmetric problems), one traditionally hopes that the change of boundary conditions will not perturb much the eigenvalues distribution of the two-grid iteration matrix, because effects are supposed limited to boundaries; see [4, 5, 34] for further discussion, motivation and validation. Regarding nonsymmetric problems, however, it is mentioned in several works that the true spectrum may differ significantly form its LFA approximation [5, 6, 28]. In [5, Section 7.5], this is associated with boundary effects that may be visible far in the interior for non elliptic or singularly perturbed equations. Here we consider this problem from a different angle. We first observe that a matrix which is diagonal in the Fourier basis is a normal matrix. Hence, with LFA one approximates the actual system matrix by a nearby normal one. Thus, at first sight, LFA is not a proper tool to analyze nonnormality effects, since these are erased from the original problem. Going to the details, the two-grid iteration matrix associated with a LFA approximation is not strictly speaking a normal matrix, even though the system matrix is normal. However, in the numerical experiments we performed, we never saw a significant difference between its field of values and the convex hull of its eigenvalues. When there is such a significant difference for the corresponding actual problem, it means that, whatever close the LFA approximation may be, this latter can at best give a good picture of the eigenvalue distribution or of the field of values, but never of both. In view of our results, we advocate that the relevance of LFA should be assessed on its capacity to approximate the true field of values, independently of the location of the true eigenvalues. In that context, one may see as a good news the observed discrepancy between true and LFA computed spectra as, e.g., in [5, 28]. However, on which basis could we hope that LFA does any better regarding the field of values? A few heuristic arguments go in that direction. Firstly, numerical comparisons between actual convergence and estimates from infinite grids show that these latter can be accurate for parabolic problems [2, 38, 39]. In [38], a connection is made with the theory of Toeplitz matrices and operators, and the related analysis of some (non multigrid) iterative methods [22]. As is well know, there is a discontinuity between the spectrum of a banded Toeplitz matrix, whatever its size, and the spectrum of the Toeplitz operator obtained at the limit when the size goes up to infinity. However, the pseudo spectrum is continuous; see [36, Chapter II] for details, and additional references. Since, on the other hand, for nonnormal matrices, the pseudo spectrum offers a much more reliable convergence indicator than the spectrum [36, Chapter VI], one may 4

then expect that the eigenvalue computation on infinite grids leads to convergence estimates that are at least qualitative correct. One may reach a similar conclusion when considering the field of values. Clearly, the system matrix obtained on a given finite grid is a principal submatrix of the system matrix on any extended grid, implying that the field of values of the former is a subset of the field of values of the latter. Hence, the field of values for the infinite operator should provide a bounding set for the true field a values, and, thus, a worst case convergence estimate. Moreover, the field of values is explicitly known for the special case of a Jordan block [2], and it may be inferred that it smoothly converges towards that of the limiting operator as the size goes up to infinity. Now, it should be clear that the above reasonings are heuristic, no matter the supporting theory is rigorous. Indeed, the system matrix is often close to Toeplitz (say, it can be block Toeplitz with Toeplitz blocks), but it is rarely exactly Toeplitz. Moreover, even it would be, the two-grid iteration matrix will in general not be a Toeplitz matrix. Similarly, that the system matrix is a principal submatrix of a bigger matrix does not imply that the associated two-grid iteration matrices satisfy a similar imbrication property. Hence, numerical experiments are needed to enlighten the discussion. Clearly, what is missing in Figure 2, is: on the left picture, the LFA approximation of the spectrum/field of values, obtained by changing the boundary conditions for periodic ones; on the right picture, the convergence estimate based on the LFA approximation of the spectral/numerical radius. These gaps are filled on Figure 3, where we also consider other values for the main parameter α, β in the stencil (.). Going back to the general discussion of our results, one sees that the numerical radius correctly predicts the actual convergence in all tested configurations. The theoretical bound based on Theorems 2.2, 3. and 3.2 is qualitatively correct but not always very accurate. Going to the details, (3.) does well in bounding the real extension of the field of values, but the control of the imaginary extension via (2.7) may be more loose; in fact, it remains the same whatever how far we are from a symmetric example with all eigenvalues real. As commented at the end of Section 2, the related loss of accuracy is, however, limited, the number of iterations being overestimated by a factor of 2 at most. Regarding LFA, it is also in all tested examples qualitatively correct with respect to the objective of predicting the true field of values and the associated numerical radius. From the accuracy viewpoint, LFA seems in all cases just slightly better than the assessment via our theoretical results. Hence, in case these latter apply, LFA brings little added value because it remains heuristic. This heuristic nature is confirmed with the next experiment, reported on Figure 4, where we check, as the grid size is doubled, the evolution of the true spectrum, of the LFA spectrum, and of the field of values boundary. For a full confirmation of the heuristic arguments above, one would expect that the boundary of the field of values converges towards the convex hull of the LFA spectrum, which is close to the field of values for the infinite grid. However, that does not seem to be the case, all reported quantities varying only little with the grid size. 5

.5 α = 5, β = 5 res. norm spec. rad. estim. num. rad. estim. LFA estim. -5.5-2 3 4.5 α = 5, β = 5 res. norm spec. rad. estim. num. rad. estim. LFA estim. -5.5-2 3 4.5 α = β = 5 5 res. norm spec. rad. estim. num. rad. estim. LFA estim. -5.5-2 3 4 Figure 3: left: spectrum of the iteration matrix T ( ), together with its LFA approximation (+), the boundary of W M,P (T ) (- -), and bound on this latter based on (2.2), (3.) (----); right: relative residual norm as a function of the number of iterations, and the three estimations based, respectively, on the spectral radius ρ(t ), on the numerical radius w M,P (T ) and on the LFA approximation of the numerical/spectral radius. 6

.5 α = 5, β =.5.5.5.5 α = 5, β =.5.5.5.5 α = β = 5.5.5.5 Figure 4: left and right: boundary of W M,P (T ) for m = 24 (- -) and m = 48 (-- -- ); left: spectrum of the iteration matrix T for m = 24 (+) and m = 48 ( ); right: LFA approximation of the spectrum of the iteration matrix T for m = 24 (+) and m = 48 ( ); 7

5 Conclusions We have seen that, for nonnormal matrices, the eigenvalues of the two-grid iteration matrix may be a completely wrong indicator, unable to predict severe convergence failures. Opposite to this, the field of values of this matrix leads to rigorous convergence estimates that have also been found accurate in several numerical examples. When the two-grid method involves only a single smoothing step with Jacobi, we proved that this field of values is located in a region of the complex plane described by two easy to assess parameters, related to a companion symmetric problem. The convergence bounds obtained on this basis, besides being rigorous, have also been found offering qualitatively correct predictions. The role of LFA has further been discussed at the light of these results. We advocate that LFA should be assessed on its ability to predict the extent of the field of values of the iteration matrix, regardless the actual location of its eigenvalues. In the few numerical examples we tested, LFA delivers results roughly equivalent to those obtained with our theoretical bounds; that is, LFA never overestimates the convergence and is always qualitatively correct. This is a good news because LFA is more widely applicable than the theoretical bounds. However, one should be cautious, because LFA remains more heuristic than in the symmetric case. This subject thus certainly deserves to be investigated further. References [] C. A. Berger and J. G. Stampfli, Mapping theorems for the numerical range, American Journal of Mathematics, 89 (967), pp. 47 55. [2] A. Brandt, Multi-level adaptive solutions to boundary-value problems, Math. Comp., 3 (977), pp. 333 39. [3], Algebraic multigrid theory: The symmetric case, Appl. Math. Comput., 9 (986), pp. 23 56. [4], Rigorous quantitative analysis of multigrid, I. constant coefficients two-level cycle with L 2 -norm, SIAM J. Numer. Anal., 3 (994), pp. 695 73. [5] A. Brandt and O. Livne, Multigrid Techniques: 984 Guide with Applications to Fluid Dynamics, Revised Edition, SIAM, 2. [6] A. Brandt and I. Yavneh, On multigrid solution of high-reynolds incompressible entering flows, J. Comput. Physics, (992), pp. 5 64. [7] M. Brezina, T. Manteuffel, S. McCormick, J. Ruge, and G. Sanders, Towards adaptive smoothed aggregation (αsa) for nonsymmetric problems, SIAM J. Sci. Comput., 32 (2), pp. 4 39. [8] P. M. de Zeeuw, Matrix dependent prolongations and restrictions in a blackbox multigrid solver, J. Comput. Appl. Math., 33 (99), pp. 27. 8

[9] J. E. Dendy, Black box multigrid for nonsymmetric problems, Appl. Math. Comput., 3 (983), pp. 26 283. [] V. A. Dobrev, T. Kolev, N. A. Petersson, and J. B. Schroder, Two-level convergence theory for multigrid reduction in time (MGRIT), SIAM J. Sci. Comput., 39 (27), pp. S5 S527. [] M. Eiermann, Fields of values and iterative methods, Linear Algebra Appl., 8 (993), pp. 67 97. [2] R. D. Falgout, S. Friedhoff, T. V. Kolev, S. P. MacLachlan, and J. B. Schroder, Parallel time integration with multigrid, SIAM J. Sci. Comput., 36 (24), pp. C635 C66. [3] R. D. Falgout and P. S. Vassilevski, On generalizing the algebraic multigrid framework, SIAM J. Numer. Anal., 42 (24), pp. 669 693. [4] R. D. Falgout, P. S. Vassilevski, and L. T. Zikatanov, On two-grid convergence estimates, Numer. Linear Algebra Appl., 2 (25), pp. 47 494. [5] S. Friedhoff and S. MacLachlan, A generalized predictive analysis tool for multigrid methods, Numer. Linear Algebra Appl., 22 (25), pp. 68 647. [6] S. Friedhoff, S. MacLachlan, and C. Börgers, Local fourier analysis of spacetime relaxation and multigrid schemes, SIAM J. Sci. Comput., 35 (23), pp. S25 S276. [7] M. Goldberg and E. Tadmor, On the numerical radius and its applications, Linear Algebra Appl., 42 (982), pp. 263 284. [8] A. Greenbaum, V. Pták, and Z. Strakoš, Any nonincreasing convergence curve is possible for GMRES, SIAM J. Matrix Anal. Appl., 7 (996), pp. 465 469. [9] W. Hackbusch, Multi-grid Methods and Applications, Springer, Berlin, 985. [2] R. Horn and C. Johnson, Topics in Matrix Analysis, Cambridge University Press, New York, 99. [2] J. Janssen and S. Vandewalle, Multigrid waveform relaxation on spatial finite element meshes: The discrete-time case, SIAM J. Sci. Comput., 7 (996), pp. 33 55. [22] A. Lumsdaine and D. Wu, Spectra and pseudospectra of waveform relaxation operators, SIAM J. Sci. Comput., 8 (997), pp. 286 34. [23] S. P. MacLachlan and C. W. Oosterlee, Algebraic multigrid solvers for complex-valued matrices, SIAM J. Sci. Comput., 3 (28), pp. 548 57. 9

[24] A. Napov and Y. Notay, Algebraic analysis of aggregation-based multigrid, Numer. Linear Algebra Appl., 8 (2), pp. 539 564. [25] Y. Notay, A robust algebraic multilevel preconditioner for non symmetric M- matrices, Numer. Linear Algebra Appl., 7 (2), pp. 243 267. [26], Algebraic analysis of two-grid methods: The nonsymmetric case, Numer. Linear Algebra Appl., 7 (2), pp. 73 96. [27], Algebraic theory of two-grid methods, Numer. Math. Theor. Meth. Appl., 8 (25), pp. 68 98. [28] C. Oosterlee, F. Gaspar, T. Washio, and R. Wienands, Multigrid line smoothers for higher order upwind discretizations of convection-dominated problems, J. Comput. Physics, 39 (998), pp. 274 37. [29] P. Oswald, Multilevel Finite Element Approximation: Theory and Applications, Teubner Skripte zur Numerik, Teubner, Stuttgart, 994. [3] C. Rodrigo, F. Gaspar, and L. Zikatanov, On the validity of local Fourier analysis, J. Comput. Math., (28). To appear. [3] J. W. Ruge and K. Stüben, Algebraic multigrid (AMG), in Multigrid Methods, S. F. McCormick, ed., Frontiers in Appl. Math. 3, SIAM, Philadelphia, 987, pp. 73 3. [32] Y. Saad, Further analysis of minimum residual iterations, Numer. Linear Algebra Appl., 7 (2), pp. 67 93. [33] M. Sala and R. S. Tuminaro, A new Petrov Galerkin smoothed aggregation preconditioner for nonsymmetric linear systems, SIAM J. Sci. Comput., 3 (28), pp. 43 66. [34] R. Stevenson, On the validity of local mode analysis of multi-grid methods, dissertation, Utrecht University, Utrecht, the Netherlands, 99. [35] K. Stüben and K. U. Trottenberg, Multigrid methods: Fundamental algorithms, model problem analysis and applications, in Multigrid Methods, W. Hackbusch and U. Trottenberg, eds., Lectures Notes in Mathematics No. 96, Berlin Heidelberg New York, 982, Springer-Verlag, pp. 76. [36] L. N. Trefethen and M. Embree, Spectra and Pseudospectra, Princeeton University Press, Princeton and Oxford, 25. [37] U. Trottenberg, C. W. Oosterlee, and A. Schüller, Multigrid, Academic Press, London, 2. 2

[38] J. Van lent, Multigrid Methods for Time-Dependent Partial Differential Equations, dissertation, K.U.Leuven, Leuven, Belgium, 26. [39] S. Vandewalle and G. Horton, Fourier mode analysis of the multigrid waveform relaxation and time-parallel multigrid methods, Computing, 54 (995), pp. 37 33. 2