CG Type Algorithm for Indefinite Linear Systems 1. Conjugate Gradient Type Methods for Solving Symmetric, Indenite Linear Systems.

Size: px

Start display at page:

Download "CG Type Algorithm for Indefinite Linear Systems 1. Conjugate Gradient Type Methods for Solving Symmetric, Indenite Linear Systems."

Gilbert Wilkerson
6 years ago
Views:

1 CG Type Algorithm for Indefinite Linear Systems 1 Conjugate Gradient Type Methods for Solving Symmetric, Indenite Linear Systems Jan Modersitzki ) Institute of Applied Mathematics Bundesstrae Hamburg Germany modersitzki@math.uni-hamburg.de Abstract. The conjugate gradient algorithm (CG) is an eective tool for solving a system of linear equation with a positive denite coecient matrix. We show the reasons for a possible breakdown of the method when applied to a symmetric system with an indenite coecient matrix. Although in nite arithmetic a breakdown of the method occurs rather seldom, near breakdowns may slow down the speed of convergence. We propose a look-ahead technique to overcome these near breakdowns. Moreover, we present a modication of the CG algorithm applicable to symmetric indenite systems (CGI). The CGI algorithm is nearly as ecient as the plain CG algorithm. We compared the CGI algorithm with a modication of the composite step bi-conjugate gradient algorithm of Bank and Chan. Key words. Krylov subspace iteration, symmetric, indenite matrices, CG algorithm, look-ahead techniques, polynomial iterative method, orthogonal polynomials, formally orthogonal polynomials. AMS(MOS) subject classication. 65F10 1 Introduction Hestenes and Stiefel [9] proposed in 1952 the conjugate gradient algorithm (CG) for solving a system of linear equations Ax = b; (1.1) where A 2 IR NN is a symmetric, positive-denite matrix. It is not advisable to apply the CG algorithm to an indenite system in general. In numerical applications we may observe that the method fails to converge. Moreover, in rare situations there may occur a division by zero and the CG algorithm breaks down. Luenberger [11] introduced the technique of hyperbolic pairs to overcome the CG break down. Fletcher [5] pointed out that \Luenberger's algorithm is unsuitable in general because it does not adequately cope with the situation when (q (n) ) T Aq (n) is small ( say) but not zero." Fletcher proposed the minimal residual algorithm and the orthogonal direction algorithm. Paige and Saunders [15] presented a stable and eective implementation of the minimal residual algorithm (MINRES) and a Galerkin approach (SYMMLQ). However, none of these algorithms is as ecient as the CG algorithm, if one asks for computing time per step and memory requirements. All these methods compute approximations in the same Krylov subspaces. Thus, in general they need about the same number of steps for convergence. Using the relation to orthogonal polynomials, we will show that the CG algorithm must break down when a certain residual polynomial p (n+1) fails to exist. We will characterize this situation and obtain criteria to decide whether p (n+1) exists or not. Theoretically we have that p (n+m) exists, where either m = 1 or m = 2. However, in numerical application it may be advantageous to proceed with p (n+m), where values m > 2 are permitted. We will show how to ll the gap in the sequence of orthogonal polynomials and still to maintain short recurrence relations. The connection between these recurrence relations and a block LDL T decomposition of a Lanczos matrix is given. We will introduce the CGI algorithms; extensions to the CG algorithm applicable to symmetric, indenite linear systems. The CGI(2) algorithm is nearly as eective as the CG algorithm and for situations where the CG algorithm does not break down equally fast. ) This research was partially carried out while the author was a guest at the Math. Inst. Utrecht. The research was supported by the HCM project FR AA Version 20-Aug-94.

2 2 Jan Modersitzki \Spikes" in the residual norm, i.e., high intermediate residual norms, may aect the numerical accuracy (see, e.g., Sleijpen, van der Vorst, and Fokkema [17, x2.1]. Therefor, we base the decision whether to perform a denite or an indenite step on the residual norm. We compare the method with a modication of the composite step Bi-CG algorithm of Bank and Chan [2] (CSBCG). Our modications are just the straight forward reduction to the symmetric case and we call the resulting algorithm the composite step conjugate gradient algorithm (CSCG). The CSBCG algorithm is a variant of the Bi-CG algorithm (see Fletcher [5]) for unsymmetric linear systems. In the CSBCG algorithm a block LDL T decomposition is used to avoid a breakdown, although the blocksize is based on the residual norm. It turns out that the CSCG algorithm and the CGI(2) algorithm are mathematically equivalent. However, the computational costs per CSCG step are higher than those for a CGI step. This paper is organized as follows. In Section 2 we introduce our notation. In Section 3 we briey introduce the CG algorithm and characterize situations of a possible break down. The mixed recurrence relations and its connection to a block LDL T decomposition of the Lanczos matrix are given in Section 4. The CGI algorithm and CSCG algorithm are proposed in Section 5. In the following Section 6we discuss some implementation aspects. Finally, we present some numerical experiments in Section 7. 2 Basic Notations and Orthonormal Polynomials We consider the linear system (1.1) where A 2 IR NN is supposed to be a symmetric and non singular matrix. Given an initial guess for the solution of (1.1) the CG algorithm generates a sequence of iterates x (n) with some particular properties. To discuss these properties, we introduce the n th residual r (n) := b? Ax (n) and the Krylov subspace, We may assume that r (0) 6= 0 and thus K (n) := K (n) (r (0) ; A) := spanfr (0) ; Ar (0) ; : : : ; A n?1 r (0) g: L := maxfn : dimk (n) = ng > 0: (2.1) For n L there is an isomorphism between the Krylov subspace and a polynomial space (n?1) := f'(t) = n?1 X j=0 a j t j ; a 0 ; : : : ; a n?1 2 IRg: Thus, we may identify v 2 K (n) with the polynomial ' 2 (n?1), where v = n?1 X j=0 a j A j r (0) =: '(A) r (0) : We are interested in an orthonormal basis for the polynomial space. To this end, we transfer the Euclidian inner product on the Krylov subspace to the polynomial space. For v = '(A) r (0) and u = (A) r (0) we have h ; 'i := (r (0) ) T (A) '(A) r (0) = u T v: (2.2) The number L of (2.1) plays a specic role. Lemma 2.1. If h ; i = 0 and 6= 0, then has degree L and the zeros of are eigenvalues of A. Proof: Since K (L) P is an invariant subspace under A, it has a basis of eigenvectors z (n), n = 1; : : : ; L. Hence, we have r (0) L = n=1 a nz (n), where a 1 ; : : : ; a L 2 IR. Let (n) denote the eigenvalue of A associated with z (n). Since P P A j r (0) L = n=1 a n j nz (n), we have a n 6= 0, for n = 1; : : : ; L. From h ; i = 0 we have 0 = (A) r (0) L = n=1 a n ( (n) ) z (n) and thus ( (n) ) = 0, for n = 1; : : : ; L. Given the inner product (2.2), the Stieltjes algorithm (see, e.g., Gautschi [6]) may be used to construct an orthonormal basis for the polynomial space. Note, rewriting this algorithm in terms of vectors (and replacing t by A), we end up with the Lanczos algorithm (cf. Lanczos [10]).

3 CG Type Algorithm for Indefinite Linear Systems 3 Stieltjes Algorithm p 2.3. Set (0) = h1; 1i; (?1) (t) = 0; and (0) (t) = 1= (0) : For n = 1; 2; : : :; compute (n?1) = h (n?1) ; t (n?1) i; b(n) (t) = (t q? (n?1) ) (n?1) (t)? (n?1) (n?2) (t); hb(n) ; b(n) i: end. (n) = If (n) 6= 0; then (n) (t) = (1= (n) ) b(n) (t); else end, L = n; (L) = b(l) ; STOP, For later references let us introduce the following denition. Definition 2.2. Given the inner product (2.2). The orthonormal polynomials are polynomials (0) ; : : : ; (L), with a) h (k) ; (n) i = 0 for k < n, b) the leading coecient of (n) is positive, and c) h (n) ; (n) i = 1, for n < L. For convenience, we add (L) (normalized as in Stieltjes Algorithm 2.3) to the set of orthonormal polynomials. Note, (L) fullls a) and b) but not c) (in particular, we have h (L) ; (L) i = 0).? Introducing (t) := t, (n) (t) := (0) (t); (1) (t); : : : ; (n?1) (t), and e (n;j), the j th unit vector in IR n, and collecting the recurrence coecients in the so-called Lanczos matrix T (n) := we have our next well-known result. 0 (0) (1) (1) (2) C.. (n?1) A 2 IRnn (n?1) (n?1) Lemma 2.3. Let (n), (n), (n), and L be as in the Stieltjes Algorithm 2.3, (n) and T (n) as above. a) For n = 0; : : :; L? 1, we have b) For n = 1; : : :; L, the eigenvalues of T (n) are the zeros of c) The eigenvalues of T (L) are eigenvalues of A. 1 C (n) = (n) T (n) + (n) (n) (e (n;n) ) T : (2.4) Proof: Part a): The equation (2.4) follows from the Stieltjes Algorithm 2.3. Part b): If (n) () = 0, then (n) () = (n) () T (n). This shows that is an eigenvalue of T (n). Part c): The eigenvalues of T (L) are the zeros of (L). Lemma 2.1 shows that the zeros of (L) are eigenvalues of A. Let I (n) be the identity matrix in IR nn and let (n). hp (m) ; Q (n) i :=? hp (j) ; q (k) i ( j=0;:::;m?1; k=0;:::;n?1 ) 2 IRmn ; where P (m) =? p (0) ; : : : ; p (m?1) and Q (n) =? q (0) ; : : : ; q (n?1). Then we have I (n) = h (n) ; (n) i and T (n) = h (n) ; (n) i:

4 4 Jan Modersitzki 3 The Conjugate Gradient Algorithm and Orthogonality The CG iterates belong to a shifted Krylov space and hence x (n) = x (0) + bp (n) (A) r (0) ; bp (n) 2 (n?1) ; (3.1) r (n) = p (n) (A) r (0) ; p (n) 2 (n) ; (3.2) where the iteration polynomial bp (n) and the residual polynomial p (n) are connected via p (n) (t) = 1? t bp (n) (t): This relation shows that the residual polynomials share the important interpolation property p (n) (0) = 1: The CG iterates are implicitely dened by p (n), where hp (n) ; t?1 p (n) i = minfhp; t?1 pi; p 2 (n) ; p(0) = 1g: (3.3) For a positive denite A the solution of (3.3) is the n th Kernel-polynomial w.r.t. the inner product h; t?1 i, see, e.g., Szego [20, Theorem 3.1.3]. It is not hard to see that the Kernel polynomial are orthogonal w.r.t. h; i, see, e.g., Szego [20, Theorem 3.1.4]. Thus, the orthogonality and the minimization property can be used equivalently to determine the CG residual polynomials, in particular p (n) (t) =? 1= (n) (0) (n) (t): (3.4) For an symmetric, indenite matrix we may also compute p (n) using equation (3.4). However, since zero belongs to the eigenvalue intervall, we can no longer assure that (n) (0) 6= 0, for all n. The next theorem claims that no residual polynomial of degree n exists when (n) (0) = 0. Theorem 3.1. Assume that L > 0 and let (0) : : : ; (L) be orthonormal polynomials (cf. 2.2). We have a) (0) (0) 6= 0 and (L) (0) 6= 0, b) (n) (0) = 0 is equivalent to T (n) being singular, c) (n) (0) 6= 0 if and only if there exists a residual polynomial p (n), d) if (n) (0) = 0, then (n+1) (0) 6= 0. Proof: Part a): Follows from (0) (t) = 1= (0)? and Lemma 2.1. Part b): Follows from Lemma 2.3 b). Part c): Assume that (n) (0) 6= 0. Hence p (n) (t) = 1= (n) (0) (n) (t) is a residual polynomial. Now assume that p (n) exists. Since p (n) is a multiple of (n) and p (n) (t) = 1, we have (n) (0) 6= 0. Part d): For the interlacing property, see, e.g., Szego [20, Theorem 3.3.2]. 4 The Mixed Recurrence Relation Our next goal is to obtain a computational criterion to decide whether a residual polynomial exists or not. We present update formulae for the computation of the residual polynomials. To this end, we introduce a sequence of formally orthogonal polynomials. In the context of the CG algorithm this is similar to an LDL T decomposition of a scaled Lanczos matrix, whereas in the context of orthogonal polynomials it is the mixed recurrence relation (see, e.g., Fischer [4, x2.5]). The formally orthogonal polynomials provide a useful tool to decide whether a residual polynomial exists or not. For '; 2 (L), we introduce the symmetric bilinear form h'; i. For an inner product associated with an indenite matrix A, we may have h'; 'i 0 for a ' 6= 0, so that this bilinear may not dene an inner product (depends on r (0) ). A polynomial q (n) of degree n that satises hq (n) ; 'i = 0; for all ' 2 (n?1) ;

5 CG Type Algorithm for Indefinite Linear Systems 5 is called formally orthogonal to (n?1). Formally orthogonal polynomials need not exist for every degree n (cf. Lemma 4.1, see also Struble [19] and Gutknecht [8, Theorem 1.8]). For convenience, let us call two polynomials q (n) and p (n) an orthogonal pair when q (n) is formally orthogonal to (n?1), p (n) is orthogonal to (n?1), p (n) (0) = 1, and the leading coecients of q (n) and p (n) coincide. Thus, for an orthogonal pair we have p (n) is a residual polynomial, hq (n) ; p (n) i = hp (n) ; p (n) i, and hq (n) ; q (n) i = hp (n) ; q (n) i. Note, this denition is not symmetric in q (n) and p (n). The following lemma shows the connection between the lack of existence of formally orthogonal polynomials and residual polynomials. It turns out that e (n) := hp(n) ; q (n) i hp (n) ; p (n) i (4.1) is the crucial number. The following lemma collects some properties. Lemma 4.1. Assume q (n) and p (n) to be an orthogonal pair and let e (n) be dened as in equation (4.1). The following statements are equivalent. a) (n+1) (0) = 0. b) e (n) = 0. c) There exists no polynomial q (n+1) of degree n + 1 formally orthogonal to (n). Proof: For a) equivalent b): With some coecients a n and a n+1 and a polynomial ' 2 (n?1), we have (n+1) = a n+1 p (n) +a n q (n) + ', where a n 6= 0. Suppose (n+1) (0) = 0 and hence a (n+1) = 0. Since (n+1) is orthogonal to (n) and q (n) is formally orthogonal to (n?1), we have 0 = hq (n) ; (n+1) i = a n hq (n) ; q (n) i. This shows e (n) = 0. Now suppose e (n) = 0. Since the degree of q (n) is n we have hq (n) ; p (n) i 6= 0. Thus, from 0 = hq (n) ; (n+1) i = a (n+1) hq (n) ; p (n) i it follows that a (n+1) = 0 and hence (n+1) (0) = 0. For b) equivalent c): Suppose e (n) = 0 and there exists a formally orthogonal polynomial q (n+1). With some coecients b n, b n+1 and a polynomial ' 2 (n?1) we have q (n+1) = b n+1 q (n) (t) + b n q (n) + '. For q (n+1) formally orthogonal to (n) we get 0 = hq (n+1) ; q (n) i = b n+1 h q (n) ; q (n) i + b n e (n) hp (n) ; p (n) i. Lemma 2.1 assures that ht q (n) ; t q (n) i 6= 0. Thus, e (n) = 0 implies b n+1 = 0, and hence q (n+1) 2 (n). Now suppose e (n) 6= 0. From part a) we get (n+1) (0) 6= 0 and from Theorem 3.1 b) we get T (n+1) is non singular. Setting q (n+1) = (n+1)? (n+1) c, where T (n+1) c = (n+1) e (n+1;n+1) we get a formally orthogonal polynomial of degree n + 1. Theorem 3.1 and Lemma 4.1 yield the necessary tool to decide whether the residual polynomial of degree n + 1 exists or not; computing e (n) (cf. (4.1)) we may check whether (n+1) (0) 6= 0 and hence p (n+1) exists, or not. The next lemma gives information about the connection between orthogonal pairs and orthonormal polynomials. For an orthogonal pair q (n) and p (n), we may set z (n+1) := q (n)? e (n) p (n) ; (n+1) := hz (n+1) ; z (n+1) i; and e (n+1) := hz (n+1) ; z (n+1) i; (4.2) where e (n) is given by equation (4.1). Not that z (n+1) is orthogonal to (n). Lemma 4.2. Assume q (n) and p (n) to be an orthogonal pair. Then! 2 ; det(t (0) ) := 1; and (4.3) a) p (n) = (?1) np (n) (n) ; where (n) := (0) (n) det(t (n) ) b) (n+1) = (?1) n =? (n+1)p (n) z (n+1) ; where z (n+1) is dened by equation (4.2). Proof: Part a): For n = 0, we have p (0) (0) = (0) (0) (0) = 1, and for n > 0 we have p (n) = p (0) (0)? (n) (n) = p (0) (0)? (n) T (n) (n) + (n) n (n) (n) =? (n) (n) n (n) ;

6 6 Jan Modersitzki (cf. (2.4)), where (n) 2 IR n s.t. T (n) (n) = p (0) e (n;1). Applying Cramers rule, we obtain (n) n = (?1)n?1 det(t (n) ) (0) (n?1) ; and hence p (n) = (?1) np (n) (n) : Part b): Since q (n) and p (n) are an orthogonal pair, z (n+1) is orthogonal to (n). The identity follows from the leading coecient of q (n) and (?1) np (n) (n+1) (n+1) being the same. Assume q (n) and p (n) to be an orthogonal pair. For convenience, let (n) be as in (4.3), and for k n, let w (n) := ((?1) n = p (n) ) q (n) ; T (n;m) := h (n;m) ; e(n;m) i; where (n;m) :=? (n) ; : : :; (m?1) and e(n;m) :=? w (n) ; w (k+1) := (k+1) (k+1) + w (k) ; where (k+1) :=?h (n+1) ; : : : ; (m?1) ; (k) ; w (k) i= (k+1) : Lemma 4.3. (m) (0) 6= 0. a) Then Assume q (n) and p (n) to be an orthogonal pair and that for n < m L we have! 2 (m) = (n) (n+1) (m) det(t (n;m) : ) b) For k = n; : : : ; m, we have h (k) ; w (k) i = 0. c) An orthogonal pair q (n) and p (n) may be computed by p (m) = (?1) mp (m) (m) ; q (m) = (?1) m ( p (m) = (m) ) w (m) : Proof: matrix with diagonal entries all one. Since h (n) ; e(n;m) i = 0, we get det(t (m) T ) = det (n) 0 h (n;m) (n) i T (n;m) = det(t (n) ) det(t (n;m) ): Part a): We have? (n) ; e (n;m) = (m) R (m), where R (m) 2 IR mm is an upper triangular From Lemma 4.2 we have (m) =! 2 (0) (m) = (n)! 2 (n+1) (m) : det(t (m) ) det(t (n;m) ) Part b): The proof is by induction. For k = n the statement is true since q (n) is formally orthogonal to (n?1). Now assume the statement to be true for k. Since h (k) ; (k+1) i = h (k) ; w (k) i = 0, it remains to show that 0 = h (k) ; w (k+1) i = (k+1) h (k) ; (k+1) i + h (k) ; w (k) i. But this follows from the denition of (k+1). Part c): Follows from Lemma 4.2 and part b). We like to state some special cases of Lemma 4.2 as corollaries. Corollary 4.4. Assume q (n) and p (n) to be an orthogonal pair and e (n) 6= 0. Then (n+1) = (n) ( (n+1) =e (n) ) 2. An orthogonal pair q (n+1) and p (n+1) may be computed by p (n+1) = (?1=e (n) ) z (n+1) ; and q (n+1) = p (n+1) + (n+1) (n) q (n) : Proof: Follows from Lemma 4.2 part b) and Lemma 4.3.

7 CG Type Algorithm for Indefinite Linear Systems 7 For an orthogonal pair q (n) and p (n), e (n) as in equation (4.1) and z (n+1), (n+1), and e (n+1) as in equation (4.2), we introduce bt (n;n+2) := h? q (n) ; z (n+1) ;? q (n) ; z (n+1) i = (n) e (n) (n+1) d (n+2) := det(b T (n;n+2) ) = (n) e (n) e (n+1)? ( (n+1) ) 2 : (n+1) e (n+1) and (4.4) Corollary 4.5. Assume q (n) and p (n) (n;n+2) to be an orthogonal pair and T b to be non singular. An orthogonal pair q (n+2) and p (n+2) may be computed by p (n+2) = p (n)?? q (n) ; z (n+1) (n+2) ; with (n+2) = (n) q (n+2) = p (n+2)?? q (n) ; z (n+1) (n+2) ; with (n+1) = e (n+1)? (n+1) ; (n+2) (n+1)? (n) e (n) : d (n+2) ) d (n+2) (n+2) 2 Proof: From h (n) ; p (n) i = h (n) ; q (n) i = h (n) ; z (n+1) i = 0, b T (n;n+2) (n+2) = (n) e (2;1), and h? q (n) ; z (n+1) ; p (n+2) i = (n) e (2;1)? b T (n;n+2) (n+2) = 0; we have p (n+2) is orthogonal to (n+1). Since p (n+2) (0) = p (n) (0) = 1, p (n+2) is a residual polynomial. Obviously, q (n+2) has the same degree and leading coecient as p (n+2). From h (n) ; p (n+2) i = h (n) ; q (n) i = h (n) ; z (n+1) i = 0, we have h (n) ; q (n+2) i = 0, and from b T (n;n+2) (n+2) =?( (n+2) = (n+2) 2 ) e (2;2), h? q (n) ; z (n+1) ; q (n+2) i = hp (n+2) ; z (n+1) ie (2;2)? b T (n;n+2) (n+2) = 0; together with hp (n+2) ; z (n+1) i =? (n+2) = (n+2) 2, we have q (n+2) is formally orthogonal to (n+1). Corollary 4.4 gives mixed recurrence relations for the residual and the formally orthogonal polynomials (see, e.g., Fischer [4, x2.5]). Corollary 4.5 gives update formulae for p (n+2) and q (n+2), assuming that bt (n;n+2) is non singular. We stress that no assumption about the existence of q (n+1) as a formally orthogonal polynomial and p (n+1) as a residual polynomial has been made. The next lemma claims that this update formula is applicable if a residual polynomial fails to exist. In particular, we have z (n+1) = q (n) (see also Modersitzki [12]). Lemma 4.6. Assume that q (n) is formally orthogonal to (n?1) and e (n) = 0. Then b T (n;n+2) (cf. (4.4)) is non singular. Proof: Part b): From e (n) = 0 we have det(b T (n;n+2) ) =?( (n+1) ) 2 6= 0 (cf. Lemma 2.1). However, there is one obvious disadvantage in this choice. The polynomial t q (n) is orthogonal to (n) only for e (n) = The Block LDL T Decomposition We show the connection between the formally orthogonal polynomials and a block LDL T decomposition of the Lanczos matrix. To this end, we set (n) eq (n) if n = 0 or = ((?1) n = (n) ) q (n) ; otherwise. (n?1) (0) = 0, Thus, eq (j) and (j), j = 0; : : :; n? 1, form a basis of (n?1). Hence, there exists a lower triangular matrix L (n) 2 IR nn with diagonal elements all one and Q (n) (t) (L (n) ) T = (n) (t); (4.5)

8 8 Jan Modersitzki where Q (n) =? eq (0) ; : : : ; eq (n?1). From the orthogonality relations, respectively the formally orthogonality relations, we have D (n) := hq (n) ; t Q (n) i 2 IR nn (4.6) is a block diagonal, regular matrix, with diagonal elements either 1-by-1 or 2-by-2. Combining (2.4), (4.5), and (4.6), we get T (n) = h (n) ; t (n) i = L (n) D (n)? L (n) T : Thus, L (n) and D (n) may be viewed as the factors of a block LDL T decomposition of T (n). For more details of the decomposition we refer to Modersitzki [12]. 5 Conjugate Gradient Type Algorithms for Indenite Linear Systems We propose three algorithms exploiting the recurrence relations given in Section 4. The criterion whether to use p (n+1) or p (n+2) as a next residual polynomial is based on the size of (n+1), where we set (n) := 1, if (n) (0) = The CGI Algorithm for Indenite Linear Systems The conjugate gradient algorithm for indenite system (CGI algorithm) reads as follows. Starting with p (0) (t) = 1, q (0) = p (0), and n = 0, we may assume q (n) and p (n) to be an orthogonal pair and (n) 6= 0. For n < j m, we may compute (j), T (n;j), and w (j), where m L such that (n+1) (m) j det(t (n;m) )j i.e., (m) (n) (cf. Lemma 4.3 a)). A new orthogonal pair may be computed using the formulae given in Lemma The CGI(2) Algorithm for Indenite Linear Systems We also present the CGI(2) algorithm, a variant of the CGI algorithm. Here, a new orthogonal pair q (m) and p (m) is computed, where m := n + 1; if (n+1) maxf (n) ; (n+2) g, n + 2; otherwise. In contrast to the CGI algorithm, we can not guarantee that (m) (n). It is known that the numerical accuracy of an algorithm depends on \spikes", i.e., large values of (n)? (0). Following the argumentation of Sleijpen, van der Vorst, and Fokkema [17, x2.1], the best we can have for the dierence between the updated residual and the true residual is jjr (n) jj? jjb? A x (n) jj CnjjA?1 jj maxfjjr (j) jj; j = 1; : : : ; ng; (5.1) where the constant C depends on the maximum number of non-zero elements of A, the size of the entries of A, and the relative machine precision. The estimation (5.1) shows that a large norm of a residual may destroy the numerical accuracy. However, our numerical experiments shows that in nearly all examples the CGI(2) algorithm leads to sucient results. 5.3 The CSCG Algorithm for Indenite Linear Systems We give a modication of the composite step bi-conjugate gradient algorithm of Bank and Chan [2] which we like to refer as CSCG algorithm. Mathematically, the the CGI(2) algorithm and the CSCG algorithm are identically. However, a dierent implementation is used to realize the CGI(2) algorithm. In particular, a dierent scaling and a dierent updating of the formally orthogonal polynomials is used. The CSCG algorithm needs more oating point operations per step but about the same number of steps. The details are given in Section 6.

9 CG Type Algorithm for Indefinite Linear Systems 9 6 Implementation Details Since we are mainly interested in the solution of a linear system of equations, we use vectors instead of polynomials to formulate the algorithms. Recall that x (n) = x (0) + bp (n) (A) r (0) and r (n) = p (n) (A) r (0) (cf. (3.1) and (3.2)). The update rule given in Corollary 4.4 translates to r (n+1) = r (n)? (1=e (n) )Aq (n) and the interpolation property of the residual polynomials yields x (n+1) = x (n) + (1=e (n) )q (n), where q (n) := q (n) (A)r (0). Since (n) = hp (n) ; p (n) i = (r (n) ) T r (n) (provided p (n) exists), an iteration may by stopped when p (n) tol; where tol is a given tolerance, e.g., tol = 10?8. For a discussion of stopping criteria, see, e.g., Barrett et al. [3, x4.2]. We are interested in the computational costs of the algorithms. To this end, we introduce the notation MV for a matrix vector multiplication, DOT for a dot product, and N-FLOPS for any computation requiring N oating point operations. E.g., a vector update of the form y = x + y, where x; y 2 IR N and 2 IR, requires 2N-FLOPS. Let us introduce the notation [S], where S is any true-or-false statement. 6.1 The Implementation of the CGI Algorithm We give an algorithmic description of the CGI algorithm (see also Section 5.1). The CGI Algorithm 6.1. Given a symmetric, regular matrix A, a right hand side b, an initial guess x (0), and a tolerance tol. The CGI algorithm generates approximations x (n) for the solution of (1.1), such that kr L k 2 tol for a certain L 2 IN. Set n = 0; r (0) = b? Ax (0) ; q (0) = r (0) ; and (0) = (r (0) ) T r (0) : While [ p (n) > tol]; do m = n + 1; e (n) = (q (n) ) T Aq (n) = (n) ; z (m) = Aq (n)? e (n) r (n) ; (m) = (z (m) ) T z (m) ; DEF = [ p (m) je (n) j p (n) ]: If not DEF, then (m) = p (m) = (n) ; p (n) =1; (m) =?e (n) = (m) ;? v (n) = (?1) n = r (n) ; v (m) = (?1) n = (m) z (m) ; p w (n) = (?1) n = q (n) ; w (m) = (m) v (m) + w (n) ; p y (n) = (?1) n = While not [ (n+1) (m) j det(t (n;m) )j] do, m = m + 1; (m?1) = (v (m?1) ) T Av (m?1) ; bv (m) = Av (m?1)? (m?1) v (m?1)? (m?1) v (m?2) ; by (m) (m) = v (m?1) + (m?1) y (m?1) + (m?1) y (m?2) ; = (v (m) ) T v (m) ; v (m) = (1= (m) )bv (m) ; y (m) = (?1= (m) )by (m) ; (m) =?( (m?2) (m?1) + (m?1) (m?1) )= (m) ; w (m) = (m) v (m) + w (m?1) ; end. p (n) x (n) ; y (m) =?(w (n) + e (n) y (n) )= (m) : (m) = (n) (n+1) (m) = det(t (n;m) ) 2 ; r (m) = (?1) mp (m) v (m) ;

10 10 Jan Modersitzki end. x (m) q (m) = (?1) mp (m) y (m) ; = (?1) m ( (m) = (m) )w (m) ; DEF = [0 = 1]; If DEF, then (m) = (m) =(e (n) ) 2 ; r (m) =?(1=e (n) )z (m) ; x (m) = x (n) + (1=e (n) )q (n) ; q (m) = r (m) + ( (m) = (n) )q (n) ; end. n = m; end. Set L = n. For the CGI(2) algorithm, we replace the rst if-statement by the following algorithm. If T (n+2) is not singular, i.e., d (n+2) 6= 0, we compute z (n+2) = r (n+2) and omit the additional scaling used in the CGI algorithm. If not DEF, then m = m + 1; e (n+1) = (z (n+1) ) T Az (n+1) ; d (m) = (n) e (n) e (n+1)? ( (n+1) ) 2 ; DEF = [d (m) = 0]: end. If not DEF, then (m) 1 = e (n+1) (n) =d (m) ; 2 (m) =? (n+1) (n) =d (m) ; z (m) = r (n)? 1 (m) Aq (n)? (m) end, (m) = (z (m) ) T z (m) ; DEF = [ p (n+1) je (n) j p (m) ]: If not DEF, then (m) = (m) ; r (m) = z (m) ; end, x (m) 2 Az (n+1) ; = x (n) + (m) 1 q (n) + (m) 2 z (n+1) ; (m) 1 = (n+1) (m) =(d (m) (m) 2 ); (m) 2 =?e (n) (n) (m) =(d (m) (m) q (m) 2 ); = r (m)? 1 (m) q (n)? 2 (m) z (n+1) ; We discuss the the CGI(2) algorithm. We start by noting that the computation of b T (n;m) (cf. (4.4)) requires one additional MV and one additional DOT. We have to distinguish the computational costs of four dierent possible steps. The results are summarized in Table (6.2) below. a) The pure denite step: Suppose (n+1) (n). Thus, we apply a denite step without further computations. b) The rst mixed denite step: Suppose (n+1) > (n). Thus, we try to perform the indenite step. Now assume that d (n+2) = det(b T (n;n+2) ) = 0. Thus, the denite step must be performed with the additional costs of a DOT and an MV. Since Aq (n+1) = (1=e (n) ) Az (n+1) + ( (n+1) = (n) ) Aq (n) ;

11 CG Type Algorithm for Indefinite Linear Systems 11 we may recycle the MV. However, due to our numerical experiments, this situation never occurs. c) The second mixed denite step: Suppose (n+1) > (n) and d (m) 6= 0. Hence, we try to perform the indenite step. But now assume that (n+1) (n+2). Thus, in addition to the computational costs of the rst mixed denite step, we have to compute z (n+2) and (n+2). d) The indenite step: This step is performed, (n+1) > (n) ; (n+2). We give the costs for CGI(2) algorithm as additional work per MV. Note that one MV enlarges the Krylov space dimension by one. Table 6.2: Additional work per MV for the CGI(2) algorithm. step DOTs N-FLOPS pure denite 2 7 rst mixed denite 3 10 second mixed denite 4 14 pure indenite 2 6 The CGI(2) algorithm requires memory space for seven vectors (x (n), q (n), r (n), z (n+1), Aq (n), Az (n+1), and z (n+2) ). For the CGI algorithm we like to remark that the determinant of T (n;m) can easily be computed with the help of a QR decomposition. This decomposition may be performed eectively with Givens rotations (see, e.g., Golub and van Loan [7, x5.1]). 6.2 The Implementation of the CSCG Algorithm Let us rst state the CSCG algorithm. The CSCG Algorithm 6.3. Given a symmetric, regular matrix A, a right hand side b, an initial guess x (0), and a tolerance tol. The CSCG algorithm generates approximations x (n) for the solution of (1.1), such that kr L k 2 tol for a certain L 2 IN. Set n = 0; r (0) = b? Ax (0) ; q (0) = r (0) ; w (0) = Aq (0) ; and (0) = (r (0) ) T r (0) : While [ (n) > tol]; do (n) = (q (n) ) T w (n) ; z (n+1) = (n) r (n)? (n) w (n) ; y (n+1) = Az (n+1) ; (n+1) = (z (n+1) ) T z (n+1) ; (n+1) = (z (n+1) ) T y (n+1) ; DEF = [ (n+1) (n) j (n) j]: If not DEF; then ed (n+2) = (n) (n+1) ( (n) ) 2? ( (n+1) ) 2 ; end. z (n+2) (n+2) = e d r (n)? ( (n) ) 3 (n+1) w (n)? ( (n) ) 2 (n+1) y (n+1) ; (n+2) = (z (n+2) ) T z (n+2) ; DEF = [ p (n+2) j (n) j p (n+1) je d (n+2) j]; If DEF; then (n) = (n) = (n) ; (n+1) = (n+1) =( (n) ) 2 ; (n+1) = (n+1) = (n) ; x (n+1) = x (n) + (n) q (n) ; r (n+1) = r (n)? (n) w (n)? = (1= (n) ) z (n+1) ; q (n+1) = (1= (n) ) z (n+1) + (n+1) q (n) ; w (n+1) = (1= (n) ) y (n+1) + (n+1) w (n) ; n = n + 1;

12 12 Jan Modersitzki else end, end. Set L = n: 1 (n+1) = ( (n) ) 3 (n+1) (n+2) =e d ; 2 (n+1) = ( (n) ) 2 (n+1) (n+2) =e d ; x (n+2) r (n+2) = x (n) + (n+1) 1 q (n) + (n+1) 2 z (n+1) ; = r (n)? (n+1) 1 w (n)? (n+1) 2 y (n+1)? = (1= e d (n+2) ) z (n+2) ; (n+2) = (r (n+2) ) T r (n+2)? = (n+2) (n+2) =( e d ) 2 ; 1 (n+1) = (n+2) = (n) ; (n+1) 2 = (n+2) (n) = (n+1) ; q (n+2) = r (n+2) + (n+1) w (n+2) = Aq (n+2) ; n = n + 2; 1 q (n) + (n+1) 2 z (n+1) ; We have that the z (j) used in the CSCG algorithm are scaled versions of the z (j) used in the CGI(2) algorithm (at least in exact arithmetic). Hence, e d (m) = ( (n) ) 2 d (m). In Bank and Chan [1] we have the following remark (we use our notation): \Note that the vectors z (n+1) are scaled versions of r (n+1) (z (n+1) = (n) r (n+1) ) when r (n+1) is dened for the denite step. Thus for the denite step, these vectors could be computed from simple rescaling, rather than from the more standard formulae given in algorithm CSBCG." No similar remark concerning the connection of r (n+2) and z (n+2) is made. Moreover, Bank and Chan gave the following formula r (n+2) = r (n)? (n+1) 1 w (n)? (n+1) 2 y (n+1) (6.4) where 1 (n+1) = ( (n) ) 3 (n+1) (n+2) (n+1) =e d and 2 = ( (n) ) 2 (n+1) (n+2) =e d. The modication of scaling z (n+2) instead of using the update rule (6.4) saves some computational work and is used in this paper. For the rst part of the denite or indenite step criterion we need one MV three DOTs, and 3N-FLOPS. As for the CGI(2) algorithm we have to distinguish dierent possibilities to perform a step. Each needs dierent computational costs. A summary of the computational costs is given in Table (6.5) below. a) The pure denite step: Assume that (n+1) (n). Thus, we perform a denite step. b) The mixed denite step: Now assume that (n+1) > (n) but (n+1) (n+2). Hence, we nally perform the denite step. c) The indenite step: Suppose that (n+1) > (n) ; (n+2). Thus the indenite step has to be performed. Table 6.5: Additional work per MV for the CSCG algorithm (scaled). step DOTs N-FLOPS pure denite 3 12 mixed denite 4 17 pure indenite The CSCG algorithm requires memory space for seven vectors (x (n), q (n), r (n), z (n+1), z (n+2), w (n), and y (n+1) ). 7 Numerical Experiments In this section we present some numerical experiments. The algorithms are implemented in MAT- LAB [13]. The computations are performed on a SUN Workstation. In our experiments we worked with A?, where 2 IR is chosen so that A becomes more or less indenite. In particular, we used values so that T (10) is expected to be singular. In our experiments we set b := A (1; : : : ; 1) T and x (0) = 0.

13 CG Type Algorithm for Indefinite Linear Systems 13 Our tables shows the shifts (), the number of negative eigenvalues (?E), and the number of iterations (#iter) needed to increase the norm of the residual under a prescribed tolerance, i.e., #iter := minf n : jj b? (A? ) x (n) jj tol g; where tol = 10?8. If an algorithm failed to converge within 2N steps, we show (2N) instead. For representation reasons we use the notation x [y] := x 10 y. For the CGI algorithm we also show the maximum block length (m) which occurred during the iteration. Example 7.1. In our rst examples we work with A E, a diagonal matrix with N = 1000 equidistant eigenvalues in the interval [1; 3], i.e., j = 1 + 2(N? j)=(n? 1), j = 1; : : :; N. The results are given in Table 7.1. This example is related to Paige Parlett van der Vorst [14]. These examples show no signicant dierent between the CSCG, CGI(2), and CGI algorithms, all three methods needed approximately the some number of steps. Note that the CG algorithm failed to converge when T (10) is supposed to be singular. Due to rounding errors the CG algorithm did not break down. However, although no breakdown occurred, the method failed to converge. For some arbitrary shift (e.g., = 1:25) the CG algorithm needed much more steps until convergence. Example 7.2. Table 7.1: Iteration history for A E, tol = 10?8.?E CG CSCG CGI(2) CGI m (1) [9] (25) (72) [10] (55) (87) [10] (69) (98) (140) [9] (158) (127) (125) [-7] (170) [8] (176) (364) We chose A C to be a diagonal matrix with N = 1000 eigenvalues distributed as the zeros of the N th Chebyshev polynomial over [1; 3], i.e., j = 2 + cos? (N? j)=(n? 1), j = 1; : : :; N. The results are given in Table 7.2. Again, we nd no dierence between the CSCG, CGI(2), and CGI algorithms (except for = 1:5) and the CG algorithm failed to converge or needed much more steps. For = 1:5, none of the algorithms led to satisfactory result. However, A C? 1:5 is singular up to working precision. Note that although the eigenvalue distribution of the matrix A C is as bad as possible, the algorithms performed much better then in the Example 7.1. Table 7.2: Iteration history for A C, tol = 10?8.?E CG CSCG CGI(2) CGI m (1) [9] (720) [10] (632) (373) (656) [10] (592) (463) [0] [0] 1.6 [0] (992) [11] (425) (331) (797) (376) [8] (899) (990)

14 14 Jan Modersitzki Example 7.3. We took A S, a diagonal matrix with N=2 = 500 equidistant eigenvalues in the interval [1; 1:5] and N=2 equidistant eigenvalues in the interval [2:5; 3], i.e., j = 1 + 0:5(N? 2j)=(N? 2), j = 1; : : : ; N=2, and j = 2:5 + (N? j)=(n? 2), j = N=2 + 1; : : : ; N. The results are given in Table 7.3. Similar examples are considered by Saad [16]. The CGI(2) algorithm and the CSCG algorithm needed approximately the same number of steps, whereas the CGI algorithm needed some additional steps. The CG algorithm is for many shifts as eective as its competitors. Table 7.3: Iteration history for A S, tol = 10?8.?E CG CSCG CGI(2) CGI m (1) [6] (40) [7] (178) (168) [7] (158) (180) [-2] (144) [7] (152) [6] (70) (1) (1) (1) (2) Example 7.4. We considered A V, a diagonal matrix with the eigenvalues 1;2 = ; and j = 7 + j, j = 3; : : :; N = 1000, x (0) = 0, and b =? 10; 10; 1 : : :; 1 T. The results for = 10 (?P), P = 0; : : : ; 4, are given in Table 7.4 (see also Figure 10.1). These examples shows that the CGI algorithms performed much better then the CG or CSCG algorithm. For this example, the smallest Ritz value converges initially to zero (cf. van der Sluis and van der Vorst [21]). Thus, for small n, T (n) is close to singularity. Table 7.4: Iteration history for A V, tol = 10?8. P?E CG CSCG CGI(2) CGI m (59) [-7] (82) [-5] (104) [-3] (126) [-6] 0.1 [-1] 0.1 [-7] 0.7 [-7] (146) Example 7.5. We treated a linear system of equation, where the system matrix A H P version of the Helmholtz dierential operator, is a discrete?u(x)? P u(x) = f(x); x 2 IR 2 ; u(x) = g(x); x is chosen to be the unit square, i.e., 0 x 1 ; x 2 1 and the positive parameter P = 100j, j = 0; : : : ; 5 (cf. Freund and Stoer [18]). We discretized the equation on a mesh, so that the size of the arising block tridiagonal system matrix results in N = The results are given in Table 7.5. For this examples, all algorithms needed about the same number of iterations to convergence. However, dierent steps are performed and resulted in dierent computational costs. Table 7.6 shows the normalized number of oating point operation needed to bring the residual norm under the prescribed tolerance; FLOPS := 1 N #iter X n=0 FLOPS per step n: This shows that the CG algorithm is the cheapest algorithm if one asks for FLOPS. The CGI algorithms are a little more expensive but much cheaper then the CSCG algorithm. Note that especially for P = 0, where

15 CG Type Algorithm for Indefinite Linear Systems 15 A H P is positive denite, the CSCG algorithm is much more expensive then the others. For this example no mixed denite and only four indenite steps are performed. For P = 200, we show the iteration history (cf. Figure 10.2), i.e., the logarithm of the norm of the relative residuals versus the number of iterations, and the FLOPS (cf. Figure 10.3), i.e., the logarithm of the norm of the relative residual versus the number of FLOPS. Table 7.5: Iteration history for A H P, tol = 10?8. P?E CG CSCG CGI(2) CGI m (14) (34) (14) (14) (15) (35) Table 7.6: FLOPS for A H P, tol = 10?8. P?E CG CSCG CGI(2) CGI Conclusion We show the reasons for a breakdown of the CG algorithm when applied to a symmetric, indenite system. If the orthogonal polynomial (n) has a root at zero, the Lanczos matrix T (n) is singular, no residual polynomial of degree n and no formally orthogonal polynomial of degree n exists. Moreover, the CG algorithm breaks down. In case of a near breakdown, i.e., the distance between an eigenvalue of T (n) and zero is small, the CG recurrence is unstable. We proposed the CGI algorithms for symmetric, indenite systems. Here, the orthogonal polynomials are computed using a Stieltjes type algorithm. An a priori check whether to use short or more stable recurrences is performed. For the general CGI algorithm the residual norm is strictly increasing whereas for the CGI(2) algorithm, we only avoid the CG breakdown. Thus, we may expect the CGI algorithm to be more stable then the CGI(2) algorithm. However, our examples shows that the CGI(2) algorithm is as stable as the CGI algorithm. We showed that the CGI(2) algorithm is mathematically equivalent to a modication of the CSBCG algorithm of Bank and Chan [2]. The CGI(2) algorithm needs about the same number of step as the CSCG algorithm, but is cheaper per step. Moreover, our examples suggested that the CGI(2) algorithm is more stable then the CSCG algorithm (cf. Example (7.4)). 9 Acknowledgements Parts of these ideas have been developed while I was guest at the University of Utrecht. I would like to thank Henk A. van der Vorst for his hospitality. Talks with Bernd Fischer, Gerard L. G. Sleijpen, and Henk A. van der Vorst are always very instructive. 10 References [1] R. E. Bank and T. F. Chan, A composite step Bi{conjugate gradient algorithm for nonsymmetric linear systems, tech. rep , University of California, UCLA, Dept. of Math., Los Angeles, CA, 1992.

16 16 Jan Modersitzki [2], An analysis of the composite step biconjugate gradient method, Numer. Math., 66 (1993), pp. 295{ 319. [3] R. Barrett, M. Berry, T. F. Chan, J. W. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. A. van der Vorst, TEMPLATES for the Solution of Linear Systems: Building Blocks for Iterative Methods, SIAM, [4] B. Fischer, Orthogonal Polynomials and polynomial based iteration methods for indenite linear systems, PhD thesis, University of Hamburg, FRG, Habilitationsschrift. [5] R. Fletcher, Conjugate gradient methods for indenite systems, in Numerical Analysis{Dundee 1975, G. Watson, ed., Heidelberg, 1976, Springer, pp. 73{89. Lecture Notes in Mathematics, volume 506. [6] W. Gautschi, A survey of Gauss-Christoel quadrature formulae, in E. B. Christoel: The inuence of his work in mathematics and the physical sciences, P. Butzer and F. Feher, eds., Basel, 1981, Birkhauser, pp. 72{147. [7] G. H. Golub and C. F. van Loan, Matrix Computations, The Johns Hopkins University Press, Baltimore, second ed., [8] M. H. Gutknecht, A completed theory of the unsymmetric Lanczos process and related algorithms, Part I, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 594{639. [9] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Stand., 49 (1952), pp. 409{436. [10] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear dierential and integral operators, J. Res. Natl. Bur. Stand., 45 (1950), pp. 255{282. [11] D. G. Luenberger, Hyperbolic pairs in the method of conjugate gradients, SIAM J. Math. Anal., 17 (1969), pp. 1263{1267. [12] J. Modersitzki, CGI-an extension of the conjugate gradient algorithm to indenite symmetric matrices, Preprint, Reihe A 77, Institute of Applied Mathematics, University of Hamburg, Nov [13] C. B. Moler, J. Little, S. Bangert, and S. Kleiman, ProMatlab User's Guide, Sherborn, MA, [14] C. C. Paige, B. N. Parlett, and H. A. van der Vorst, Approximate solutions and eigenvalue bounds from Krylov subspaces, pp. 1{19. To appear in Num. Lin. Alg. Appl. in [15] C. C. Paige and M. A. Saunders, Solution of sparse indenite systems of linear equations, SIAM J. Numer. Anal., 12 (1975), pp. 617{629. [16] Y. Saad, Iterative solution of indenite symmetrix linear systems by methods using orthogonal polynomials over two disjoint intervals, SIAM J. Numer. Anal., 20 (1983), pp. 784{811. [17] G. L. G. Sleijpen, H. A. van der Vorst, and D. R. Fokkema, BiCGstab(l) and other hybrid Bi-CG methods, Numer. Alg., 7 (1994), pp. 75{109. [18] J. Stoer and R. W. Freund, On the solution of large linear systems of equations by conjugate gradient algorithms, in Computer Methods in Applied Science and Engineering { V, R. Glowinski and J. Lions, eds., Amsterdam, 1982, North Holland, pp. 35{53. [19] G. W. Struble, Orthogonal polynomials: variable{signed weight functions, Numer. Math., 5 (1963), pp. 88{94. [20] G. Szego, Orthogonal polynomials, AMS Colloquium Publications XXIII, American Mathematical Society, New York, revised ed., [21] A. van der Sluis and H. A. van der Vorst, The convergence behavior of Ritz values in the presence of close eigenvalues, Linear Alg. Appl., 88/89 (1987), pp. 651{694.

17 CG Type Algorithm for Indefinite Linear Systems log10(rel. residuals) cg o cscg2 x cgim2 + cgim iteration Figure Iterations history for A V, = 10? log10(rel. residuals) cg o cscg2 x cgim2 + cgim iteration Figure Iterations history for A H P, P = log10(rel. residuals) cg o cscg2-14 x cgim2 + cgim FLOPS Figure FLOPS for A H P, P = 200.

The Lanczos and conjugate gradient algorithms

The Lanczos and conjugate gradient algorithms Gérard MEURANT October, 2008 1 The Lanczos algorithm 2 The Lanczos algorithm in finite precision 3 The nonsymmetric Lanczos algorithm 4 The Golub Kahan bidiagonalization