4.6 Iterative Solvers for Linear Systems

Size: px

Start display at page:

Download "4.6 Iterative Solvers for Linear Systems"

Aubrey Perkins
5 years ago
Views:

1 4.6 Iterative Solvers for Linear Systems Why use iterative methods? Virtually all direct methods for solving Ax = b require O(n 3 ) floating point operations. In practical applications the matrix A often has a very special structure, and possibly contains many zero entries (is sparse). It is especially in these cases where iterative methods pay off. As an example, recall the discretized Poisson problem from the very first class meeting. Certain iterative methods were mentioned that can solve the resulting linear system in O(n) operations. We now study some of these methods in detail Classical Iterative Solvers We begin with a simple example. Example: Consider the (iterative) solution of the linear system (see also the Maple worksheet 577 Iterative Solvers.mws) [ ] [ ] [ ] 2 1 x1 6 = x 2 The simplest approach is to use a fixedpoint iteration scheme. Thus, we solve the first equation for x 1, and the second equation for x 2, and start the iteration with some initial guess x (0) = [x (0) 1 x(0) 2 ]T. This results in x (k) 1 = x (k) 2 = ( 6 x (k 1) ) 2 /2 ( 6 x (k 1) ) 1 /2, for k = 1, 2, 3,.... The method just described is known as Jacobi iteration, and a general algorithm is given by Algorithm (Jacobi iteration) Input A, b, x, n, M for k = 1 to M do for i = 1 to n do u i = b i a ij x j /a ii for i = 1 to n do Output x x i = u i j i 1

2 An obvious improvement of the Jacobi iteration is to use the most recent estimate for x i as soon as it becomes available. For the example above this results in x (k) ( 1 = 6 x (k 1) ) 2 /2 x (k) ( 2 = 6 x (k) ) 1 /2, for k = 1, 2, 3,.... This method is known as Gauss-Seidel iteration, and the general algorithm is Algorithm (Gauss-Seidel iteration) Input A, b, x, n, M for k = 1 to M do for i = 1 to n do x i = b i a ij x j /a ii Output x Remarks: j i 1. Both algorithms can be made more efficient by moving the division to a preprocessing step. 2. For a generic problem Gauss-Seidel iteration converges faster than Jacobi iteration. However, there are certain problems for which Jacobi iteration may converge faster. 3. Jacobi iteration has the advantage that it can be implemented on a parallel computer in a straightforward manner. In a more abstract setting we want to consider the solution of the linear system Ax = b by an iterative algorithm of the form x (k) = Gx (k 1) + c, k = 1, 2, 3,..., (14) where x (0) is our initial guess, G is a (constant) iteration matrix, and c is a (constant) vector. All of the classical solvers discussed here are based on a splitting of A of the form A = A Q + Q with splitting matrix Q. Thus, provided Q 1 exists, Ax = b Qx + (A Q)x = b Qx = (Q A)x + b (15) ( ) x = I Q 1 A x + Q 1 b. (16) }{{}}{{} =c =G Our goals in choosing the splitting matrix Q are 2

3 1. Q should be invertible, i.e., we want to be able to solve (15) efficiently. 2. The iteration (14) should converge quickly to a fixed point. The two extreme choices Q = A and Q = I correspond to the original problem Ax = b (so that nothing has been gained), and the so-called Richardson method x (k) = (I A)x (k 1) + b = x (k 1) + r (k 1) (for which no system needs to be solved, but which deping on A may not converge). Therefore, one usually chooses Q somewhere between these two extremes. We can see that the Jacobi iteration as defined above corresponds to the choice Q = diag(a). To this, Q 1 A = 1 a 11 1 a a nn a 11 a a 1n a 21 a a 2n.... =. a n1 a n2... a nn a 11 a a 21 a 22 a 22. a n1 a nn a 1n 11 a 2n a 22 a a a n2 a nn... a nn a nn i.e., the rows of A are scaled by the reciprocals of the diagonal entries. Then we have ( a 1j x j ) /a 11 Q 1 ( b 1 /a 11 a 2j x j ) /a 22 Ax = and Q 1 b 2 /a 22 b =. (., a nj x j ) /a nn b n /a nn so that (using (16) componentwise) x (k) i = x (k 1) i = b i j i a ij x (k 1) j a ij x (k 1) j /a ii, /a ii + b i /a ii which is exactly the formula used in the algorithm. The Gauss-Seidel iteration algorithm can be derived by taking Q as the lower triangle of A, i.e., { Aij, i j Q ij = 0, otherwise. Thus (15) can be written as, a 11 a 21 a a n1 a n2... a nn Qx (k) = (Q A)x (k 1) + b 0 a a 1n x(k) 0. =... an 1,n 0 x (k 1) + b. 3

4 If we solve this for x (k) i, i = 1,..., n, we get x (k) 1 = x (k) 2 =. x (k) i =. x (k) n = a 1j x (k 1) j + b 1 /a 11 j=2 a 2j x (k 1) j + b 2 a 21 x (k) 1 /a 22 j=3 b n This, however, is equivalent to j=i+1 x i n 1 a ij x (k 1) j a nj x (k) /a nn. b i j i 1 + b i a ij x (k) /a ii a ij x j /a ii, j i which is the formula from the algorithm. We now turn to a discussion of the properties required of a general iteration matrix G (or splitting matrix Q) so that the fixed point iteration (14) is guaranteed to converge. Theorem 4.14 If G = I Q 1 A < 1 then (14) converges to a solution of Ax = b for any starting vector x (0). Proof: It is clear that (14) is a fixed point iteration (x = F (x)), and the fixed point is a solution of Ax = b. This follows immediately from the equivalence transformations involving (15) and (16). Now, let e (k) = x (k) x, where x is the fixed point solution. Then Taking norms, we have Proceeding recursively we get e (k) = Gx ( (k 1) + c )(Gx + c) = G x (k 1) x = Ge (k 1). e (k) = Ge (k 1) G e (k 1). e (k) G k e (0), so that e (k) 0 if G < 1. Of course, this ensures x (k) x, i.e., convergence of the method. j 4

5 Corollary 4.15 If A is diagonally dominant, then Jacobi iteration converges for any x (0). Proof: By the definition of diagonal dominance (Definition 4.8 in Sect. 4.3) the entries of A satisfy a ii > a ij, i = 1,..., n. (17) j i We need to show I Q 1 A < 1. Since the statement of Theorem 4.14 holds for any matrix norm, we can take. Earlier we showed that, for Jacobi iteration, a 1 12 a a n 11 a Q 1 21 a A = a n a a n2 a nn... 1 But then a n1 a nn I Q 1 A = max by the diagonal dominance of A (17). ( 1 i n a ij a ii j i = max 1 i n < 1 ) I Q 1 A In order to be able to formulate a necessary and sufficient condition for the convergence of (14) we need to cite a lemma which follows from the Schur factorization Theorem 5.19 in Sect Lemma 4.16 Any n n matrix A is similar to an upper triangular matrix U whose off-diagonal elements are arbitrarily small, i.e., for any ɛ > 0 there exists a nonsingular n n matrix S such that S 1 AS = U = D + T, where D = diag(u) (containing the eigenvalues of A) and T < ɛ. Proof: See textbook p. 214 (proof of Theorem 3). Remark: Lemma 4.16 says that any n n matrix A is almost similar to a diagonal matrix. This is what we saw at work when computing eigenvalues via the QR iteration algorithm in Sect The next ingredient requires that we introduce the spectral radius of A, denoted by ρ(a) = max 1 i n { λ i : λ i an eigenvalue of A}. Theorem 4.17 The spectral radius of A is the largest lower bound for any matrix norm induced by a vector norm, i.e., ρ(a) = inf A. ij 5

6 Proof: First we show ρ(a) inf A. Let (λ, x) be the eigenpair of A for which ρ(a) = λ. Then, for any vector norm, A = sup Au Ax u =1 x = λx = λ = ρ(a). x Taking the infimum over all induced matrix norms we get inf A ρ(a). Next we show that ρ(a) inf A. Applying the maximum norm to the factorization of Lemma 4.16 we have S 1 AS = D + T D + T. Now, Lemma 4.16 ensures that for any ɛ > 0 we have T < ɛ. Moreover, D = diag(λ 1,..., λ n ) so that D = ρ(a). Therefore, S 1 AS ρ(a) + ɛ. From the homework problem 4.6#6 we know that there exists another induced norm such that A = S 1 AS. Therefore, A ρ(a) + ɛ. Finally, we can take the infimum and obtain inf A ρ(a) + ɛ. Since this statement holds for arbitrary ɛ we have inf A ρ(a) and are done. Now we can prove Theorem 4.18 The iteration x (k) = Gx (k 1) + c (cf. (14)) converges to the solution of Ax = b for any initial guess x (0) if and only if ρ(g) < 1. Proof: First we assume ρ(g) < 1 and show convergence. Theorem 4.17 tells us that there exists a matrix norm for which G < 1, and then Theorem 4.14 ensures convergence. To prove the reverse implication we assume ρ(a) > 1 and deduce divergence of the fixed point iteration. Let (λ, u) be an eigenpair of G with λ 1. Then we choose x (0) as x (0) = u + x, where x is the fixed point of (14). Now, using (14) we have x (k) x = G(x (k 1) x). 6

7 By applying this idea recursively we obtain x (k) x = G k (x (0) x) = G k u = λ k u. By the choice of (λ, u) we see that x (k) diverges. A sufficient convergence criterion for Gauss-Seidel iteration can now also be formulated. Theorem 4.19 If A is symmetric and positive definite then Gauss-Seidel iteration converges for any initial vector x (0). Remark: The statement of the theorem can be relaxed. All that is needed is that A is symmetric positive semi-definite with positive diagonal entries, and such that a solution of Ax = b exists. This slightly weaker version ensures that the normal equations can be solved using Gauss-Seidel iteration. Proof of Theorem 4.19: According to Theorem 4.18 we need to show ρ(g) < 1, where, for Gauss-Seidel iteration, we have Here we have decomposed A into G = I Q 1 A = I (L + D) 1 A. (18) A = L + D + }{{}}{{} L T, =Q =A Q where L is the lower triangular part of A without the diagonal. We begin by defining a vector norm which induces a matrix norm x A = x T Ax G A = max x 0 Gx A x A. This allows us to make use of Theorem 4.17, and we can instead show Gx 2 A x 2 A = (Gx)T AGx x T Ax < 1, (19) since then Now, (19) is equivalent to ρ(g) G A < 1. x T (A G T AG)x > 0. (20) Using the definition of G (18) some messy manipulations yield A G T AG = A(L T + D) 1 D(L + D) 1 A = A(L T + D) 1 D D(L + D) 1 A, 7

8 where the positive definiteness of A (d i > 0) permits the last step. Returning to (20), and using the symmetry of A, we have x T (A G T AG)x = x T A(L T + D) 1 D D(L + D) 1 Ax }{{}}{{} =y T =y = y T y 0. In fact, this inequality is strict for x 0 since D(L + D) 1 A has full rank. Rate of Convergence We want to have an estimate for how many iterations are required in the fixed point iteration x (k) = Gx (k 1) + c to reduce the initial error e (0) = x (0) x by a factor of 10 α for some positive exponent α. We know (see the proof of Theorem 4.14) that the fixed point iteration scheme leads to an error estimate of the form Now, using Theorem 4.17 we can conclude e (k) G k e (0). e (k) [ρ(g)] k e (0). Our goal is to reduce the initial error by a factor 10 α. Therefore, we want or Thus e (k) [ρ(g)] k e (0) 10 α e (0) [ρ(g)] k 10 α. k ln (ρ(g)) α ln 10. Since, for a convergent fixed point iteration we have 0 < ρ(g) < 1 we get If we define the rate of convergence k α ln 10 ln (ρ(g)). r = ln (ρ(g)) ln 10 = 1 log 10 (ρ(g)), then we should have k α r to ensure the desired improvement. 8

9 Successive Over Relaxation (SOR) Another iterative solver can be obtained by computing the iterate x (k) as a combination of x (k) and x (k 1), i.e., x (k) (1 ω)x (k 1) + ωx (k), with the relaxation parameter ω chosen appropriately. This is a generalization of the Gauss-Seidel iteration (corresponding to ω = 1). The standard SOR algorithm is as follows (a more general version is discussed in the textbook): Algorithm (SOR iteration) Input A, b, x, ω, n, M for k = 1 to M do for i = 1 to n do i 1 u i = (1 ω)x i + ω b i a ij u j for i = 1 to n do Output x x i = u i j=i+1 a ij x j /a ii Theorem 4.20 If A is symmetric and positive definite and 0 < ω < 2 then SOR iteration converges for any starting value x (0). Proof: omitted Remarks: 1. The optimal value of ω is known only for special cases. For example, if A is tridiagonal, then 2 ω opt = ρ(g), where G is the iteration matrix for the Gauss-Seidel iteration. 2. For (sparse) matrices with special structure the basic iteration algorithms can be implemented much more efficiently. For example, for the Poisson problem discussed at the beginning of the semester Jacobi iteration becomes 9

10 for k = 1 to M do for i = 1 to n 1 do for j = 1 to n 1 do u (k) ij = ( u (k 1) i 1,j + u(k 1) i,j 1 + u(k 1) i+1,j + u(k 1) i,j+1 + f ij n 2 ) /4 Note that even though the system matrix A is of size n 2 n 2 only five entries per row are non-zero and they are hard-coded into the algorithm. Thus, the matrix A is never explicitly and completely formed. In Matlab the i-j double loops can be implemented in one single line of code. 3. A few other classical iterative methods such as symmetric successive over relaxation (SSOR) and Chebyshev iteration are discussed in the textbook. 4. Due to their rather slow convergence classical iterative solvers are no longer very popular as solvers of linear systems. However, they are frequently used as preconditioners for more sophisticated methods (e.g., conjugate gradient or multigrid). 4.7 Steepest Descent and Conjugate Gradient Methods Even though the SOR method may converge faster than Jacobi or Gauss-Seidel iteration, it is difficult to choose a good relaxation parameter ω. Therefore, most of the time another class of iterative solvers is used for linear systems with symmetric positive definite system matrices. These methods are based on Theorem 4.21 If A is symmetric positive definite, then solving Ax = b is equivalent to minimizing the quadratic form q(x) = x, Ax 2 x, b. Remarks: 1. Here we have used the notation x, y = x T y = product in IR n. x i y i for the standard inner i=1 2. The family of optimization based iterative solvers is also known as Krylov subspace iteration in the literature. 10

11 Proof: First we consider only changes of q along the ray x + tv, where t IR, and v 0 a fixed direction. By definition q(x + tv) = x + tv, A(x + tv) 2 x + tv, b = x, Ax + tv, Ax + x, A(tv) + tv, A(tv) 2 x, b 2 tv, b = x, Ax 2 x, b + tv, Ax + x, A(tv) + tv, A(tv) 2 tv, b (21) }{{} =q(x) Now, using the symmetry of A, ( ) T ( T x, A(tv) = tx T (Av) = t v T A T x = t v Ax) T = tv T Ax, where the last equality holds because the inner product is a scalar quantity. Therefore, Inserting (22) into (21) we get x, A(tv) = t v, Ax. (22) q(x + tv) = q(x) + 2t v, Ax + t 2 v, Av 2t v, b = q(x) + 2t v, Ax b + t 2 v, Av. (23) Note that here v, Av > 0 since A is positive definite and v 0. Therefore, as a quadratic function of t (with positive leading coefficient) q(x + v) has a minimum. A necessary condition for the minimum (along the ray x + tv) is d q(x + tv) = 0. dt Thus, we calculate d q(x + tv) = 2 v, Ax b + 2t v, Av. (24) dt Clearly, (24) is zero for ˆt = v, Ax b v, Av = v, b Ax. v, Av The value of the minimum is obtained from (23): ( ) v, b Ax v, b Ax 2 q(x + ˆtv) = q(x) + 2 v, Ax b + v, Av v, Av v, Av v, b Ax 2 v, b Ax 2 = q(x) 2 + v, Av v, Av v, b Ax 2 = q(x). v, Av Now, q(x+ˆtv) < q(x) if and only if v, b Ax 0, i.e., the direction v is not orthogonal to the residual. Thus the value of q can be reduced by moving from x to x + tv along the direction v, where v is some direction not orthogonal to the residual b Ax. To conclude the proof we have two possibilities to consider: 11

12 1. x is a solution of Ax = b, i.e., b Ax = 0. Then q(x) is actually the minimum value, and no choice of v can reduce the value of q (i.e., x solution of Ax = b = x minimizer of q ). 2. x is such that Ax b, then there exists a direction v such that v, b Ax 0 and so q(x) is not a minimum (and thus can be improved which we do until we have reached the minimum and case 1.). So, x a minimizer of q = x a solution of Ax = b. The claim is established. The argument used at the of the proof suggests an iterative algorithm for the solution of Ax = b via minimization of q. Algorithm Input A, b, initial guess x (0), initial search direction v (0) for k = 0, 1,... do Calculate the stepsize t = v(k), b Ax (k) v (k), Av (k) Update the solution x (k+1) = x (k) + tv (k) Pick a new search direction v (k+1) Output x (k+1) Remark: The preceding algorithm is in quotation marks since it is not yet a complete algorithm. We have not yet answered how to choose the search directions v (k+1). The only constraint we have so far is that v, b Ax 0. Method of Steepest Descent From Calculus we know that the gradient of q indicates the direction of greatest change. Therefore, in the so-called method of steepest descent we choose v (k+1) = q(x (k+1) ). In homework problem 4.7#1 you will show that and so we get Algorithm (Steepest descent) Input A, b, initial guess x for k = 0, 1,... do q(x (k+1) ) = 2(Ax (k+1) b), v = b Ax v, v t = v, Av x = x + tv (see the rough algorithm above) Output x 12

13 Remark: This method is not very practical, as it usually converges rather slowly. Since the level curves of q are ellipses (for problems with two unknowns), the search directions t to zig-zag down the surface instead of going straight to the bottom of the valley (especially for elongated ellipses, i.e., κ 2 (A) large). See Figure 4.3 in the textbook. Conjugate Search Directions The main ingredient in deriving another one of the top ten algorithms of the 20th century (the conjugate gradient method) will be the use of so-called conjugate search directions. To this we introduce Definition 4.22 Let A be a symmetric positive definite matrix. Then the inner product is called an A-inner product. x, y A = x, Ay The A-inner product induces a vector norm x A = x, x A = x, Ax. We call a set of vectors {u (1),..., u (n) } A-orthonormal if u (i), u (j) A = δ ij. The main theorem (which will give us the conjugate gradient method) is Theorem 4.23 Let {u (1),..., u (n) } be an A-orthonormal set, and define x (i) = x (i 1) + b Ax (i 1), u (i) u (i), i = 1,..., n, (25) with x (0) an arbitrary starting vector. Then Ax (n) = b. Remarks: 1. Note that Theorem 4.23 says that the iteration (25) yields the solution of Ax = b in n steps, and thus can be considered a direct method. However, in practice this usually does not happen (due to roundoff errors). Historically, this means that after its introduction in 1952 by Hestenes and Stiefel the conjugate gradient method was considered a useless (direct) method, and long forgotten. 2. In the 1970s it was observed that for certain (large, sparse, and well-conditioned) problems the conjugate gradient method converges extremely fast to an accurate approximate solution of Ax = b (much faster than in n steps). And thus, it has become one of the top iterative solvers. In fact, one can show that, for well-conditioned problems, the conjugate gradient method converges in O( κ) iterations. 13

14 Proof of Theorem 4.23: From (25) we see that we can define the stepsize as t i = b Ax (i 1), u (i). (26) Then (25) becomes We multiply this by A and get x (i) = x (i 1) + t i u (i). Therefore, proceeding recursively, Ax (i) = Ax (i 1) + t i Au (i). (27) Ax (n) = Ax (n 1) + t n Au (n) = Ax (n 2) + t n 1 Au (n 1) + t n Au (n). = Ax (0) + t 1 Au (1) + t 2 Au (2) t n Au (n). Now we subtract b and take the inner product with u (i) (which allows us to use the A-orthonormality of {u (1),..., u (n) }) to arrive at If we can show that Ax (n) b, u (i) = Ax (0) b, u (i) + = Ax (0) b, u (i) + t i. t j Au (j), u (i) }{{} =δ ij t i = Ax (0) b, u (i), (28) then Ax (n) b, u (i) = 0, and so Ax (n) b is orthogonal to all u (i), i = 1,..., n. In order to see the implications of this statement we observe below that {u (1),..., u (n) } form a basis for IR n, and thus Ax (n) b = 0 (which proves the statement of the theorem). From the A-orthonormality of {u (1),..., u (n) } we know so that Au (j), u (i) = δ ij, U T AU = I. This implies that both A and U are nonsingular, and thus the columns u (i), i = 1,..., n, of U form a basis of IR n. It now remains to establish (28). From (26) we have t i = b Ax (i 1), u (i) = b Ax (0) + Ax (0) Ax (1) + Ax (1)... Ax (i 2) + Ax (i 2) Ax (i 1), u (i) = b Ax (0), u (i) + Ax (0) Ax }{{ (1), u } Ax (i 2) Ax (i 1), u }{{}, = t 1 Au (1) = t i 1 Au (i 1) where we have used (27). Thus, t i = b Ax (0), u (i) t 1 Au (1), u (i)... t i 1 Au (i 1), u (i) }{{}}{{} =0 =0 14

15 = Ax (0) b, u (i), which establishes (28). The main question now is How to find {u (1),..., u (n) }? We can use any basis of IR n, and then obtain {u (1),..., u (n) } by applying the Gram- Schmidt algorithm (with respect to the A-inner product) to this set. Therefore, we start with the canonical basis {e (1),..., e (n) }. Then and the general step is where u (1) = e(1) e (1) A = u (i) = i 1 e (1) e (1), Ae (1) v(i) v (i) A, v (i) = e (i) e (i), u (j) A u (j). Since e (i), u (j) A = e (i), Au (j) = (Au (j) ) i we have and u (1) = e(1) a11 i 1 v (i) = e (i) (Au (j) ) i u (j). Before stating the conjugate gradient algorithm we reformulate Theorem 4.23 for an A-orthogonal (instead of A-orthonormal) set. Theorem 4.24 Let {v (1),..., v (n) } be an A-orthogonal set, and define t i = b Ax(i 1), v (i) v (i) A. Then, for any starting vector x (0), the iteration x (i) = x (i 1) + t i v (i), i = 1,..., n,, yields Ax (n) = b. Proof: Obvious. Remark: Here the search directions v (i) are fixed before the algorithm is started. It is better to determine v (i) when needed during the algorithm by making sure that at each step the components in the previously used directions are removed. This is done in 15

16 Algorithm (Conjugate gradient method) (1) Input A, b, x (0), M, ɛ (2) r (0) = b Ax (0) (3) v (0) = r (0) (4) Output 0, x (0), r (0) (5) for k = 0 to M 1 do (6) if v (k) = 0 then stop (7) t k = r(k) 2 2 v (k) 2 A (8) x (k+1) = x (k) + t k v (k) (9) r (k+1) = r (k) t k Av (k) (10) if r (k+1) 2 < ɛ then stop (11) s k = r(k+1) 2 2 r (k) 2 2 (12) v (k+1) = r (k+1) + s k v (k) (13) Output k + 1, x (k+1), r (k+1) (14) Remarks: 1. A computer version (with lots of overwriting) is listed in the textbook on page The conjugate gradient algorithm is very efficient since the bulk of work for each iteration is the one matrix-vector multiplication Av (k) needed for the update of the residual. It is essential that this operation is customized for the matrix A. For many sparse matrices this can be done in O(n) operations, and thus the overall operations count for the conjugate gradient method for many problems is also O(n). 3. The Matlab script CGDemo.m illustrates the convergence behavior of the conjugate gradient algorithm. In this example the CG algorithm is applied to a symmetric positive definite matrix. The parameter τ affects the number of nonzero entries as well as the 2-norm condition number of A. This is done by replacing all off-diagonal entries of A with a ij > τ by zero, i.e., for small τ A is very sparse and well-conditioned, whereas for large τ both the number of non-zero entries as well as the condition number increase. We can see that for the ill-conditioned case no convergence at all is achieved in the allowed 20 iterations, whereas for the well-conditioned matrix the CG algorithm converges in 8 iterations to a solution with relative residual accuracy of

17 We now need to show that this algorithm agrees with the theory developed so far. Thus, we need to show {v (0), v (1),..., v (n 1) } is A-orthogonal (see c) in Theorem 4.25). Moreover, we will show that the residuals {r (0), r (1),..., r (n 1) } are orthogonal (cf. e)), and that they can be computed recursively as formulated in the algorithm (follows from d) below). Theorem 4.25 If we use the conjugate gradient algorithm to solve the linear system Ax = b, where A is an n n symmetric positive matrix, then for any m < n a) r (m), v (i) = 0, (0 i < m), b) r (i), r (i) = r (i), v (i), (0 i m), c) v (m), Av (i) = 0, (0 i < m), d) r (i) = b Ax (i), (0 i m), e) r (m), r (i) = 0, (0 i < m), f) r (i) 0, (0 i m). Remark: Items a) and b) are not directly related to the claims formulated above, but are required to prove the other statements. Proof: We use induction on m. The beginning, m = 0 is trivial. Now we assume that the statements a) f) hold for m and prove they are true for m + 1. a) Show r (m+1), v (i) = 0, (0 i m). First, we treat the case i = m. Using the recursive definition of the residuals from line (9) of the CG algorithm we have r (m+1), v (m) = r (m) t m Av (m), v (m) = r (m), v (m) t m Av (m), v (m). The definition of the stepsize, line (7), yields r (m+1), v (m) = r (m), v (m) r(m) 2 2 v (m) 2 Av (m), v (m) A = r (m), v (m) r (m), r (m) = 0, where the last simplification follows from the induction hypothesis for statement b). Now we discuss the case i < m. From line (9) of the algorithm we get r (m+1), v (i) = r (m) t m Av (m), v (i) = r (m), v (i) t m Av (m), v (i) }{{}}{{} = 0, =0 =0 where we have used the induction hypotheses for statements a) and c), respectively. b) We show r (m+1), r (m+1) = r (m+1), v (m+1). Using line (12) of the algorithm r (m+1), v (m+1) = r (m+1), r (m+1) + s m v (m) 17

18 = r (m+1), r (m+1) + s m r (m+1), v (m). }{{} =0 Here we have made use of the statement just proved in a) above. c) We need to show v (m+1), Av (i) = 0, for 0 i m. First we discuss the case 0 i < m. Line (12) of the algorithm yields v (m+1), Av (i) = r (m+1) + s m v (m), Av (i) = r (m+1), Av (i) + s m v (m), Av (i). Next, line (9) of the algorithm implies Av (k) = 1 t k (r (k) r (k+1)). Using this formula on the first term of the right-hand side above we have v (m+1), Av (i) = 1 t i r (m+1), r (i) r (i+1) + s m v (m), Av (i) = 1 t i r (m+1, v (i) s i 1 v (i 1) v (i+1) + s i v (i) + s m v (m), Av (i), where we have used line (12) (together with the understanding that s 1 = 0, v ( 1) = 0) to replace r (i) and r (i+1). We now expand the right-hand side, and get v (m+1), Av (i) = 1 t i r(m+1), v (i) s i 1 r (m+1), v (i 1) r (m+1), v (i+1) }{{}}{{}}{{} a) =0 a) =0 a) =0 = +s i r (m+1), v (i) }{{} + s m v (m), Av (i) }{{} =0 = 0. a) =0 for i<m Here we have used the result already proven in a) as well as the induction hypothesis for c). Note that the third and the fifth terms above are the reason why we have to now discuss the case i = m separately. We can proceed as above, but in the are left with v (m+1), Av (m) = 1 t m r (m+1), v (m+1) + s m v (m), Av (m). Now, replacing the stepsizes t m and s m using lines (7) and (11), respectively, we get v (m+1), Av (m) = v(m) 2 A r (m) 2 2 r (m+1), v (m+1) + r(m+1) 2 2 r (m) 2 v (m), Av (m) 2 = v(m) 2 A r (m) 2 r (m+1), r (m+1) + r(m+1) r (m) 2 v (m) 2 A 2 = 0. 18

19 where, at the, we have made use of statement b) proved above as well as the definition of the A-norm of a vector. d) We need to show r (m+1) = b Ax (m+1). From line (8) of the algorithm we have b Ax (m+1) = b A(x (m) + t m v (m) ) = b Ax (m) t m Av (m) = r (m) t m Av (m) by virtue of the induction hypothesis for d). Using line (9) of the algorithm we can replace t m Av (m) by a difference of residuals and have b Ax (m+1) = r (m) (r (m) r (m+1) ) = r (m+1). e) We show r (m+1), r (i) = 0 for 0 i m. From line (12) of the algorithm (with s 1 = v ( 1) = 0) we get r (m+1), r (i) = r (m+1), v (i) s i 1 v (i 1) = r (m+1), v (i) s i 1 r (m+1), v (i 1) = 0. }{{}}{{} a) =0 f) We need to show r (m+1) 0. Since A is positive definite, the quadratic form for v (m+1) 0 satisfies v (m+1), Av (m+1) > 0. On the other hand, using line (12) of the algorithm, v (m+1), Av (m+1) = r (m+1) + s m v (m), Av (m+1) = r (m+1), Av (m+1) + s m v (m), Av (m+1). }{{} Thus, r (m+1) cannot be zero (since A is positive definite and v (m+1) 0). Remark: The algorithms of this section were designed to minimize the quadratic form q(x) = x, Ax 2 x, b. We will now show they also minimize e (k) A. Let e (k) = x x (k), where x is the true solution of the linear system Ax = b. Then a) =0 c) =0 e (k) 2 A = e (k), Ae (k) = x x (k), A(x x (k) ) = x (k), Ax (k) 2 x (k), }{{} Ax + x, }{{} Ax =b =b = q(x (k) ) + x, b. Now, since x and b are constant, we see that minimization of q is equivalent to minimization of the A-norm of the error. Preconditioned CG We mentioned earlier that the number of iterations required for the conjugate gradient algorithm to converge is proportional to κ 2 (A). Thus, for poorly conditioned 19

20 matrices, convergence will be very slow. A commonly used trick is therefore to precondition the problem. The basic idea can be described as follows. We want to transform the linear system Ax = b into an equivalent system Âˆx = ˆb, where κ(â) < κ(a). According to the comments above this should result in faster convergence. How do we find Â, ˆx, and ˆb? Consider a nonsingular n n matrix S. We can rewrite Our choice of S should be such that Ax = b S T Ax = S T b S} T {{ AS}} S 1 {{ x} = S}{{} T b. =ˆx =Â =ˆb SS T = Q 1 (29) with Q a symmetric positive definite matrix called splitting matrix or preconditioner. This will preserve the symmetry and positive definiteness of the system. Before discussing any specific preconditioners we consider how this framework affects the CG algorithm. Formally, the conjugate gradient algorithm for the problem Âˆx = ˆb is (1) Input Â, ˆb, ˆx (0), M, ɛ (2) ˆr (0) = ˆb Âˆx(0) (3) ˆv (0) = ˆr (0) (4) Output 0, ˆx (0), ˆr (0) (5) for k = 0 to M 1 do (6) if ˆv (k) = 0 then stop (7) ˆt k = ˆr(k) 2 2 ˆv (k) 2 Â (8) ˆx (k+1) = ˆx (k) + ˆt kˆv (k) (9) ˆr (k+1) = ˆr (k) ˆt k Âˆv (k) (10) if ˆr (k+1) 2 < ɛ then stop (11) ŝ k = ˆr(k+1) 2 2 ˆr (k) 2 2 (12) ˆv (k+1) = ˆr (k+1) + ŝ kˆv (k) (13) Output k + 1, ˆx (k+1), ˆr (k+1) (14) 20

21 Remark: The algorithm is made more efficient if the preconditioning operations are included directly in the algorithm. We therefore make the following substitutions. ˆx (k) = S 1 x (k), ˆr (k) = ˆb Âˆx(k) = S T b(s T AS)(S 1 x (k) ) = S T b S T Ax (k) = S T r (k), and also define ˆv (k) = S 1 v (k), r (k) = Q 1 r (k). Now we can consider how this transforms the algorithm above. Line (2) becomes ˆr (0) = ˆb Âˆx(0) S T r (0) = S T b S T ASS 1 x (0) r (0) = b Ax (0), where we have multiplied by S T in the last step. For line (3) we have ˆv (0) = ˆr (0) S 1 v (0) = S T r (0) v (0) = Q 1 r (0), where we have multiplied by S and used the definition (29) of the preconditioner Q. Line (7) transforms as follows: ˆt k = ˆr (k) 2 2/ ˆv (k) 2 Â = S T r (k) 2 2/ S 1 v (k) 2 S T AS = S T r (k), S T r (k) / S 1 v (k), S T ASS 1 v (k) = r (k), SS T }{{} =Q 1 r (k) / v (k), Av (k) = r (k), r (k) / v (k) 2 A. Here we have made use of the easily established fact that for any n n matrix B x, By = B T x, y. Line (8) becomes ˆx (k+1) = ˆx (k) + ˆt kˆv (k) S 1 x (k+1) = S 1 x (k) + ˆt k S 1 v (k) x (k+1) = x (k) + ˆt k v (k), where we have multiplied by S in the last step. For line (9) we have ˆr (k+1) = ˆr (k) ˆt k Âˆv (k) S T r (k+1) = S T r (k) ˆt k (S T AS)S 1 v (k) r (k+1) = r (k) ˆt k Av (k), where we have multiplied by S T. Line (11) transforms as follows: ŝ k = ˆr (k+1) 2 2/ ˆr (k)

22 = S T r (k+1) 2 2/ S T r (k) 2 2 = r (k+1), r (k+1) / r (k), r (k). The last step here requires some of the same manipulations as those used for line (7) above. Finally, for line (12) we have ˆv (k+1) = ˆr (k+1) + ŝ kˆv (k) S 1 v (k+1) = S T r (k+1) + ŝ k S 1 v (k) v (k+1) = Q 1 r (k+1) + ŝ k v (k) v (k+1) = r (k+1) + ŝ k v (k), where we have multiplied by S and used the definition of Q in the penultimate step. Therefore, the optimized algorithm for the preconditioned conjugate gradient algorithm is Algorithm (PCG) Input A, b, x (0), Q, M, ɛ r (0) = b Ax (0) Solve Q r (0) = r (0) for r (0) v (0) = r (0) Output 0, x (0), r (0) for k = 0 to M 1 do if v (k) = 0 then stop ˆt k = r(k), r (k) v (k) 2 A x (k+1) = x (k) + ˆt k v (k) r (k+1) = r (k) ˆt k Av (k) Solve Q r (k+1) = r (k+1) for r (k+1) if r (k+1), r (k+1) < ɛ then if r (k+1) 2 2 < ɛ then stop ŝ k = r(k+1), r (k+1) r (k), r (k) v (k+1) = r (k+1) + ŝ k v (k) Output k + 1, x (k+1), r (k+1) 22

23 Remark: This algorithm requires the additional work need to solve the linear system Q r = r once per iteration. Therefore we will want to choose Q so that this can be done easily and efficiently. The two extreme cases Q = I and Q = A are of no interest. Q = I gives us the ordinary CG algorithm, whereas Q = A (with (S = A 1/2 ) leads to the trivial preconditioned system Âˆx = ˆb ˆx = ˆb since (using the symmetry of A) Â = S T AS = A T/2 A T/2 A 1/2 A 1/2 = I. This may seem useful at first, but to get the solution x we need x = Sˆx = A 1/2ˆx = A 1/2ˆb = A 1/2 S T b = A 1/2 A T/2 b = A 1 b, which is just as complicated as the original problem. Therefore, Q should be chosen somewhere in between. Moreover, we want 1. Q should be symmetric and positive definite. 2. Q should be such that Q r = r can be solved efficiently. 3. Q should approximate A 1 in the sense that I Q 1 A < 1. Possible choices for Q If we use the decomposition A = L + D + L T matrix A then of the symmetric positive definite Q = D leads to Jacobi preconditioning, Q = L + D leads to Gauss-Seidel preconditioning, Q = 1 ω (D + ωl) leads to SOR preconditioning. Another popular preconditioner is Q = HH T, where H is close to L. This method is referred to as incomplete Cholesky factorization (see the book by Golub and van Loan for more details). Remark: The Matlab script PCGDemo.m illustrates the convergence behavior of the preconditioned conjugate gradient algorithm. The matrix A here is a symmetric positive definite matrix with all zeros except a ii = i on the diagonal, a ij = 1 on the sub- and superdiagonal, and a ij = 1 on the 100th sub- and superdiagonals, i.e., for i j = 100. The right-hand side vector is b = [1,..., 1] T. We observe that the basic CG algorithm converges very slowly, whereas the Jacobi-preconditioned method converges much faster. 23

24 An Overview of Iterative Solvers We with a decision tree (taken from the book by Demmel) containing rough guidelines for the use of most of the presently available iterative solvers. Branches to the left correspond to No, to the right to Yes. All methods mentioned belong to the family of Krylov subspace iterations. A symmetric A T available A definite storage expensive A wellconditioned A wellconditioned λ min, λ max known GMRES CGS, Bi-CGStab, GMRES(k) QMR CGN MINRES CGN CG CG with Chebyshev acceleration The abbreviations used are GMRES Generalized Minimum RESidual (1986), CGS Conjugate Gradient Squared (1989), Bi-CGStab Stabilized Bi-Conjugate Gradient (1992), QMR Quasi-Minimal Residual (1991), CGN Conjugate Gradient applied to the Normal equations, MINRES MINimum RESidual. Remark: A lot more details on iterative solvers can be found in the books by Demmel, Greenbaum, Golub and van Loan, and Trefethen and Bau. 24

Iterative Methods for Solving A x = b

Iterative Methods for Solving A x = b A good (free) online source for iterative methods for solving A x = b is given in the description of a set of iterative solvers called templates found at netlib: http