CG VERSUS MINRES: AN EMPIRICAL COMPARISON

Size: px

Start display at page:

Download "CG VERSUS MINRES: AN EMPIRICAL COMPARISON"

Juniper Norris
6 years ago
Views:

1 VERSUS : AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONG AND MICHAEL SAUNDERS Abstract. For iterative solution of symmetric systems Ax = b, the conjugate gradient method () is commonly used when A is positive definite, while the minimum residual method () is typically reserved for indefinite systems. We investigate the sequence of approximate solutions x k generated by each method and suggest that even if A is positive definite, may be preferable to if iterations are to be terminated early. In particular, we show for that the solution norms x k are monotonically increasing when A is positive definite (as was already known for ), and the solution errors x x k are monotonically decreasing. We also show that the backward errors for the iterates x k are monotonically decreasing. Key words. conjugate gradient method, minimum residual method, iterative method, sparse matrix, linear equations,, CR,, Krylov subspace method, trust-region method 1. Introduction. The conjugate gradient method () [9] and the minimum residual method () [16] are both Krylov subspace methods for the iterative solution of symmetric linear equations Ax = b. is commonly used when the matrix A is positive definite, while is generally reserved for indefinite systems [24, p85]. We reexamine this wisdom from the point of view of early termination on positive-definite systems. We assume that the system Ax = b is real with A symmetric positive definite (spd) and of dimension n n. The Lanczos process [11] with starting vector b may be used to generate the n k matrix V k ( ) v 1 v 2... v k and the (k + 1) k Hessenberg tridiagonal matrix T k such that AV k = V k+1 T k for k = 1, 2,..., l and AV l = V l T l for some l n, where the columns of V k form a theoretically orthonormal basis for the kth Krylov subspace K k (A, b) span{b, Ab, A 2 b,..., A k b}, and T l is l l and tridiagonal. Approximate solutions within the kth Krylov subspace may be formed as x k = V k y k for some k-vector y k. As shown in [16], three iterative methods,, and SYMMLQ may be derived by choosing y k appropriately at each iteration. is well defined if A is spd, while and SYMMLQ are stable for any symmetric nonsingular A. As noted by Choi [2], SYMMLQ can form an approximation x k+1 = V k+1 y k+1 in the (k+1)th Krylov subspace when and are forming their approximations x k = V k y k in the kth subspace. It would be of future interest to compare all three methods on spd systems, but for the remainder of this paper we focus on and. With different methods using the same information V k+1 and T k to compute solution estimates x k = V k y k within the same Krylov subspace (for each k), it is commonly thought that the number of iterations required will be similar for each method, and hence should be preferable on spd systems because it requires somewhat fewer floating-point operations per iteration. This view is justified if an accurate solution is required (stopping tolerance τ close to machine precision ɛ). We show that with looser stopping tolerances, is sure to terminate sooner than when the Report SOL Submitted October 19, 211 to SQU Journal for Science. ICME, Stanford University, CA , USA (clfong@stanford.edu). Partially supported by a Stanford Graduate Fellowship. Systems Optimization Laboratory, Department of Management Science and Engineering, Stanford University, CA , USA (saunders@stanford.edu). Partially supported by Office of Naval Research grant N and by the U.S. Army Research Laboratory, through the Army High Performance Computing Research Center, Cooperative Agreement W911NF

2 2 DAVID FONG AND MICHAEL SAUNDERS stopping rule is based on the backward error for x k, and by numerical examples we illustrate that the difference in iteration numbers can be substantial Notation. We study the application of and to real symmetric positive-definite (spd) systems Ax = b. The unique solution is denoted by x. The initial approximate solution is x, and r k b Ax k is the residual vector for an approximation x k within the kth Krylov subspace. For a vector v and matrix A, v and A denote the 2-norm and the Frobenius norm respectively, and A indicates that A is spd. 2. Minimization properties of Krylov subspace methods. With exact arithmetic, the Lanczos process terminates with k = l for some l n. To ensure that the approximations x k = V k y k improve by some measure as k increases toward l, the Krylov solvers minimize some convex function within the expanding Krylov subspaces [8] When A is spd, the quadratic form φ(x) 1 2 xt Ax b T x is bounded below, and its unique minimizer solves Ax = b. A characterization of the iterations is that they minimize the quadratic form within each Krylov subspace [8], [15, 2.4], [25, ]: x C k = V k y C k, where y C k = arg min y φ(v k y). With b = Ax and 2φ(x k ) = x T k Ax k 2x T k Ax, this is equivalent to minimizing the function x x k A (x x k ) T A(x x k ), known as the energy norm of the error, within each Krylov subspace. For some applications, this is a desirable property [19, 22, 1, 15, 25] For nonsingular (and possibly indefinite) systems, the residual norm was used in [16] to characterize the iterations: x M k = V k y M k, where y M k = arg min y b AV k y. (2.1) Thus, minimizes r k within the kth Krylov subspace. This was also an aim of Stiefel s Conjugate Residual method (CR) [21] for spd systems (and of Luenberger s extensions of CR to indefinite systems [13, 14]). Thus, CR and must generate the same iterates on spd systems. We use this connection to prove that x k increases monotonically when is applied to an spd system and CR. The two methods for solving spd systems Ax = b are summarized in Table 2.1. The first two columns are pseudocodes for and CR with iteration number k omitted for clarity; they match our Matlab implementations. Note that q = Ap in both methods, but it is not computed as such in CR. Termination occurs when r = ( ρ = β = ). To prove our main result we need to introduce iteration indices; see column 3 of Table 2.1. Termination occurs when r k = for some index k = l n ( ρ l = β l =, r l = s l = p l = q l = ). Note: This l is the same as the l at which the Lanczos process theoretically terminates for the given A and b. Theorem 2.1. The following properties holds for Algorithm CR: (a) q T i q j = (i j) (b) r T i q j = (i j + 1)

3 VERSUS 3 Table 2.1 Pseudocode for algorithms and CR Initialize x =, r = b ρ = r T r, p = r Repeat q = Ap α = ρ/p T q x x + αp r r αq ρ = ρ, ρ = r T r β = ρ/ ρ p r + βp CR Initialize x =, r = b, s = Ar, ρ = r T s, p = r, q = s Repeat (q = Ap) α = ρ/ q 2 x x + αp r r αq s = Ar ρ = ρ, ρ = r T s β = ρ/ ρ p r + βp q s + βq CR Initialize x =, r = b, s = Ar, ρ = r T s, p = r, q = s For k = 1, 2,... (q k = Ap k ) α k = ρ k / q k 2 x k = x k + α k p k r k = r k α k q k s k = Ar k ρ k = r T k s k β k = ρ k /ρ k p k = r k + β k p k q k = s k + β k q k Proof. Given in [14, Theorem 1]. Theorem 2.2. The following properties holds for Algorithm CR: (a) α i (b) β i (c) p T i q j (d) p T i p j (e) x T i p j (f) r T i p j Proof. (a) Here we use the fact that A is spd. The inequalities are strict until i = l (and r l = ). (b) And again: (c) Case I: i = j Case II: i j = k > ρ i = r T i s i = r T i Ar i (A ) (2.2) α i = ρ i / q i 2 β i = ρ i /ρ i (by (2.2)) p T i q i = p T i Ap i (A ) p T i q j = p T i q i k = ri T q i k + β i p T iq i k = β i p T iq i k (by Thm 2.1 (b)), where β i by (b) and p T i q i k by induction as (i) (i k) = k < k.

4 4 DAVID FONG AND MICHAEL SAUNDERS Case III: j i = k > p T i q j = p T i q i+k = p T i Ap i+k = p T i A(r i+k + β i+k p i+k ) = q T i r i+k + β i+k p T i q i+k = β i+k p T i q i+k (by Thm 2.1 (b)), where β i+k by (b) and p T i q i+k by induction as (i+k) i = k < k. (d) At termination, define P span{p, p 1,..., p l } and Q span{q,..., q l }. By construction, P = span{b, Ab,..., A l b} and Q = span{ab,..., A l b} (since q i = Ap i ). Again by construction, x l P, and since r l = we have b = Ax l b Q. We see that P Q. By Theorem 2.1(a), {q i / q i } l i= forms an orthonormal basis for Q. If we project p i P Q onto this basis, we have p i = l k= p T i q k qk T q q k, k where all coordinates are non-negative from (c). Similarly for any other p j, j < l. Therefore p T i p j for any i, j < l. (e) By construction, x i = x i + α i p i = = i α k p k (x = ) k=1 Therefore x T i p i by (d) and (a). (f) Note that any r i can be expressed as a sum of q i : Thus we have r i = r i+1 + α i+1 q i = = r l + α l q l + + α i+1 q i = α l q l + + α i+1 q i. r T i p j = (α l q l + + α i+1 q i ) T p j, where the inequality follows from (a) and (c). We are now able to prove our main theorem about the monotonic increase of x k for CR and. A similar result was proved for by Steihaug [19]. We know that x k is also monotonic for LSQR [17]. Theorem 2.3. For CR (and hence ) on an spd system Ax = b, x k increases monotonically. Proof. x i 2 x i 2 = 2α i x T i p i + p T i p i, where the last inequality follows from Theorem 2.2 (a), (d) and (e). Therefore x i x i. Theorem 2.4. For CR (and hence ) on an spd system Ax = b, the error x x k decreases monotonically.

5 VERSUS 5 Proof. From the update rule for x k, we can express the final solution x l = x as x l = x l + α l p l = = x k + α k+1 p k + + α l p l = x k + α k p k + α k+1 p k + + α l p l. Using the last two equalities above, we can write x l x k 2 x l x k 2 = (x l x k ) T (x l x k ) (x l x k ) T (x l x k ) = 2α k p T k(α k+1 p k + + α l p l ) + α 2 kp T kp k, where the last inequality follows from Theorem 2.2 (a), (d). 3. Backward error analysis. For many physical problems requiring numerical solution, we are given inexact or uncertain input data (in this case A and/or b). It is not justifiable to seek a solution beyond the accuracy of the data [6]. Instead, it is more reasonable to stop an iterative solver once we know that the current approximate solution solves a nearby problem. The measure of nearby should match the error in the input data. The design of such stopping rules is an important application of backward error analysis. For a consistent linear system Ax = b, we think of x k coming from the kth iteration of one of the iterative solvers. Following Titley-Péloquin [23] we say that x k is an acceptable solution if and only if there exist perturbations E and f satisfying (A + E)x k = b + f, E A α, f b β (3.1) for some tolerances α, β that reflect the (preferably known) accuracy of the data. We are naturally interested in minimizing the size of E and f. If we define the optimization problem min ξ s.t. (A + E)x k = b + f, ξ,e,f E A αξ, f b βξ to have optimal solution ξ k, E k, f k (all functions of x k, α, and β), we see that x k is an acceptable solution if and only if ξ k 1. We call ξ k the normwise relative backward error (NRBE) for x k. With r k = b Ax k, the optimal solution ξ k, E k, f k is shown in [23] to be β b φ k = α A x k + β b, E k = (1 φ k) x k 2 r kx T k, (3.2) r k ξ k = α A x k + β b, f k = φ k r k. (3.3) (See [1, p12] for the case β = and [1, 7.1 and p336] for the case α = β.) 3.1. Stopping rule. For general tolerances α and β, the condition ξ k 1 for x k to be an acceptable solution becomes r k α A x k + β b, (3.4) the stopping rule used in LSQR for consistent systems [17, p54, rule S1].

6 6 DAVID FONG AND MICHAEL SAUNDERS 3.2. Monotonic backward errors. Of interest is the size of the perturbations to A and b for which x k is an exact solution of Ax = b. From (3.2) (3.3), the perturbations have the following norms: E k = (1 φ k ) r k x k = f k = φ k r k = α A r k α A x k + β b, (3.5) β b r k α A x k + β b. (3.6) Since x k is monotonically increasing for and, we see from (3.2) that φ k is monotonically decreasing for both solvers. Since r k is monotonically decreasing for (but not for ), we have the following result. Theorem 3.1. Suppose α > and β > in (3.1). For CR and (but not ), the relative backward errors E k / A and f k / b decrease monotonically. Proof. This follows from (3.5) (3.6) with x k increasing for both solvers and r k decreasing for CR and but not for. 4. Numerical results. Here we compare the convergence of and on various spd systems Ax = b and some associated indefinite systems (A δi)x = b. The test examples are drawn from the University of Florida Sparse Matrix Collection (Davis [5]). We experimented with all 26 cases for which A is real spd and b is supplied. Since A is spd, we apply diagonal preconditioning by redefining A and b as follows: d = diag(a), D = diag(1./sqrt(d)), A DAD, b Db, b b/ b. Thus in the figures below we have diag(a) = I and b = 1. The stopping rule used for and was (3.4) with α = and β = 1 (that is, r k 1 b = 1 ), but with a maximum of n iterations Positive-definite systems. In defining backward errors, we assume for simplicity that α > and β = in (3.1) (3.3), even though it doesn t match the choice of α and β in the stopping rule (3.4). This gives φ k = and E k = r k / x k in (3.5). Thus, as in Theorem 3.1, we expect E k to decrease monotonically for CR and but not for. Figure 4.1 shows the backward errors for four representative examples. We see that the E k values converge smoothly for while they fluctuate for. For every k, the backward error is smaller than that of, so that can always be stopped earlier than. These figures illustrate our belief that should be considered for spd systems when high accuracy is not required. Figure 4.2 shows r k and x k for and on two typical spd examples. We see that x k is monotonically increasing for both solvers, and the x k values rise fairly rapidly to their limiting value x, with a moderate delay for. Figure 4.3 shows r k and x k for and on two spd examples in which the residual decrease and the solution norm increase are somewhat slower than typical. The rise of x k for is rather more delayed. In the second case, if the stopping tolerance were β = 1 rather than β = 1, the final x k (k 1) would be less than half the exact value x. It will be of future interest to evaluate this effect within the context of trust-region methods for optimization.

7 VERSUS 7 2 Name:Simon_raefsky4, Dim:19779x19779, nnz: , id=7 Name:Cannizzo_sts498, Dim:498x498, nnz:72356, id=13 log( r /) log( r /) log( r /) x 1 4 Name:Schenk_AFE_af_shell8, Dim:54855x54855, nnz: , id= log( r /) Name:BenElechi_BenElechi1, Dim:245874x245874, nnz: , id= x 1 4 Fig Comparison of backward errors for and solving four spd systems Ax = b with n = 19779, 498, 54855, and The values of log 1 ( r k / x k ) are plotted against iteration number k. These values define log 1 ( E k ) when the stopping tolerances in (3.4) are α > and β =. Upper left: The backward error converges monotonically while the backward error oscillates toward convergence. Upper right: and both converge reasonably well, with staying ahead by 2 5 iterations. Lower left: converges at the same speed as until near convergence, but slows down and takes twice as many iterations as to converge. Lower right: stays ahead of by two orders of magnitude during most iterations.

8 8 DAVID FONG AND MICHAEL SAUNDERS 2 Name:Simon_olafu, Dim:16146x16146, nnz:115156, id=6 1 Name:Cannizzo_sts498, Dim:498x498, nnz:72356, id= x x 15 Name:Simon_olafu, Dim:16146x16146, nnz:115156, id= Name:Cannizzo_sts498, Dim:498x498, nnz:72356, id= x Fig Comparison of residual and solution norms for and solving two spd systems Ax = b with n = and 498. These are typical examples. Top: The values of log 1 r k are plotted against iteration number k. Bottom: The values of x k are plotted against k. The solution norms grow somewhat faster for than for. Both reach the limiting value x significantly before x k is close to x.

9 VERSUS 9 Name:Schmid_thermal1, Dim:82654x82654, nnz:574458, id=14 Name:BenElechi_BenElechi1, Dim:245874x245874, nnz: , id= Name:Schmid_thermal1, Dim:82654x82654, nnz:574458, id= x 1 4 Name:BenElechi_BenElechi1, Dim:245874x245874, nnz: , id= x 1 4 Fig Comparison of residual and solution norms for and solving two spd systems Ax = b with n = and Sometimes the solution norms take longer to reach the limiting value x. Top: The values of log 1 r k are plotted against iteration number k. Bottom: The values of x k are plotted against k. Again the solution norms grow faster for.

10 1 DAVID FONG AND MICHAEL SAUNDERS x k r k / x k Iteration number Iteration number Fig For on the indefinite problem (4.1), x k and the backward error r k / x k are both slightly non-monotonic Indefinite systems. A key part of Steihaug s trust-region method for large-scale unconstrained optimization [19] (see also [4]) is his proof that when is applied to a symmetric (possibly indefinite) system Ax = b, the solution norms x 1,..., x k are strictly increasing as long as p T j Ap j > for all iterations 1 j k. (We are using the notation in Table 2.1.) From our proof of Theorem 2.2, we see that the same property holds for CR and as long as both p T j Ap j > and rj TAr j > for all iterations 1 j k. In case future research finds that is a useful solver in the trust-region context, it is of interest now to offer some empirical results about the behavior of x k when is applied to indefinite systems. First, on the nonsingular indefinite system x = , (4.1) gives non-monotonic solution norms, as shown in the left plot of Figure 4.4. The decrease in x k implies that the backward errors r k / x k may not be monotonic, as illustrated in the right plot. More generally, we can gain an impression of the behavior of x k by recalling from Choi et al. [3] the connection between and -QLP. Both methods compute the iterates x M k = V kyk M in (2.1) from the subproblems y M k = arg min y R k T k y β 1 e 1 and possibly T l y M l = β 1 e 1. When A is nonsingular or Ax = b is consistent (which we now assume), yk M is uniquely defined for each k l and the methods compute the same iterates x M k (but by different numerical methods). In fact they both compute the expanding QR factorizations [ ] [ ] Rk t Q k Tk β 1 e 1 = k, φ k (with R k upper tridiagonal) and -QLP also computes the orthogonal factorizations R k P k = L k (with L k lower tridiagonal), from which the kth solution estimate is defined by W k = V k P k, L k u k = t k, and x M k = W ku k. As shown in [3, 5.3], the

11 VERSUS 11 Simon_olafu, Dim:16146x16146, nnz:115156, id=32 Cannizzo_sts498, Dim:498x498, nnz:72356, id= Simon_olafu, Dim:16146x16146, nnz:115156, id= Cannizzo_sts498, Dim:498x498, nnz:72356, id= Fig Residual norms and solution norms when is applied to two indefinite systems (A δi)x = b, where A is the spd matrices used in Figure 4.2 (n = and 498) and δ =.5 is large enough to make the systems indefinite. Top: The values of log 1 r k are plotted against iteration number k for the first n iterations. Bottom left: The values of x k are plotted against k. During the n = iterations, x k increased 83% of the time and the backward errors r k / x k (not shown here) decreased 96% of the time. Bottom right: During the n = 498 iterations, x k increased 9% of the time and the backward errors r k / x k (not shown here) decreased 98% of the time. construction of these quantities is such that the first k 3 columns of W k are the same as in W k, and the first k 3 elements of u k are the same as in u k. Since W k has orthonormal columns, x M k = u k, where the first k 2 elements of u k are unaltered by later iterations. As shown in [3, 6.5], it means that certain quantities can be cheaply updated to give norm estimates in the form χ 2 χ 2 + ˆµ 2 k, x M k 2 = χ 2 + µ 2 k + µ 2 k, where it is clear that χ 2 increases monotonically. Although the last two terms are of unpredictable size, x M k 2 tends to be dominated by the monotonic term χ 2 and we can expect that x M k will be approximately monotonic as k increases from 1 to l. Experimentally we find that for most iterations on an indefinite problem, x k does increase. To obtain indefinite examples that were sensibly scaled, we used the four spd (A, b) cases in Figures , applied diagonal scaling as before, and solved (A δi)x = b with δ =.5 and where A and b are now scaled (so that

12 12 DAVID FONG AND MICHAEL SAUNDERS Schmid_thermal1, Dim:82654x82654, nnz:574458, id=4 BenElechi_BenElechi1, Dim:245874x245874, nnz: , id= x x Schmid_thermal1, Dim:82654x82654, nnz:574458, id=4 4 BenElechi_BenElechi1, Dim:245874x245874, nnz: , id= x x 1 5 Fig Residual norms and solution norms when is applied to two indefinite systems (A δi)x = b, where A is the spd matrices used in Figure 4.3 (n = and ) and δ =.5 is large enough to make the systems indefinite. Top: The values of log 1 r k are plotted against iteration number k for the first n iterations. Bottom left: The values of x k are plotted against k. There is a mild but clear decrease in x k over an interval of about 1 iterations. During the n = iterations, x k increased 83% of the time and the backward errors r k / x k (not shown here) decreased 91% of the time. Bottom right: The solution norms and backward errors are essentially monotonic. During the n = iterations, x k increased 88% of the time and the backward errors r k / x k (not shown here) decreased 95% of the time. diag(a) = I). The number of iterations increased significantly but was limited to n. Figure 4.5 shows log 1 r k and x k for the first two cases (where A is the spd matrices in Figure 4.2). The values of x k are essentially monotonic. The backward errors r k / x k (not shown) were even closer to being monotonic. Figure 4.6 shows x k and log 1 r k for the second two cases (where A is the spd matrices in Figure 4.3). The left example reveals a definite period of decrease in x k. Nevertheless, during the n = iterations, x k increased 83% of the time and the backward errors r k / x k decreased 91% of the time. The right example is more like those in Figure 4.5. During n = iterations, x k increased 83% of the time, the backward errors r k / x k decreased 91% of the time, and any nonmonotonicity was very slight.

13 VERSUS Conclusions. For full-rank least-squares problems min Ax b, the solvers LSQR [17, 18] and LSMR [7, 12] are equivalent to and on the (spd) normal equation A T Ax = A T b. Comparisons in [7] already indicated that LSMR can often stop much sooner than LSQR when Stewart s backward error norm for least-squares problems ( A T r k / r k [2]) is used in both cases. Our theoretical and experimental results here provide analogous evidence that can often stop much sooner than on spd systems when the stopping rule is based on backward error norms r k / x k (or the more general norms in (3.5) (3.6)). By definition, minimizes r k in the kth Krylov subspace, so that r k decreases monotonically. Theorem 2.2 shows that shares a known property of : that x k increases monotonically when A is spd. This implies that x k is monotonic for LSMR (as conjectured in [7]), and suggests that may be a useful alternative to in the context of trust-region methods for optimization. Finally, while the energy norms of the errors ( x x k A ) are known to be monotonic, Theorem 2.4 shows that for on spd systems the error norms x x k are monotonic. Acknowledgements. We kindly thank Professor Mehiddin Al-Baali and other colleagues for organizing the Second International Conference on Numerical Analysis and Optimization at Sultan Qaboos University (January 3 6, 211, Muscat, Sultanate of Oman). Their wish to publish some of the conference papers in a special issue of SQU Journal for Science gave added motivation for this research. We also thank Dr Sou-Cheng Choi for many helpful discussions of the iterative solvers, and Michael Friedlander for a final welcome tip. REFERENCES [1] M. Arioli, A stopping criterion for the conjugate gradient algorithm in a finite element method framework, Numer. Math., 97 (24), pp [2] Sou-Cheng Choi, Iterative Methods for Singular Linear Equations and Least-Squares Problems, PhD thesis, Stanford University, Stanford, CA, December 26. [3] S.-C. Choi, C. C. Paige, and M. A. Saunders, -QLP: a Krylov subspace method for indefinite or singular symmetric systems, SIAM J. Sci. Comput., 33 (211), pp [4] A. R. Conn, N. I. M. Gould, and Ph. L. Toint, Trust-region Methods, vol. 1, SIAM, Philadelphia, 2. [5] T. A. Davis, University of Florida Sparse Matrix Collection. research/sparse/matrices. [6] J. J. Dongarra, J. R. Bunch, C. B. Moler, and G. W. Stewart, LINPACK Users Guide, SIAM, Philadelphia, [7] D. C.-L. Fong and M. A. Saunders, LSMR: An iterative algorithm for sparse least-squares problems, SIAM J. Sci. Comput., to appear (211). [8] R. W. Freund, G. H. Golub, and N. M. Nachtigal, Iterative solution of linear systems, Acta Numerica, 1 (1992), pp [9] M. R. Hestenes and E. Stiefel, Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Standards, 49 (1952), pp [1] N. J. Higham, Accuracy and Stability of Numerical Algorithms, SIAM, Philadelphia, second ed., 22. [11] C. Lanczos, An iteration method for the solution of the eigenvalue problem of linear differential and integral operators, J. Res. Nat. Bur. Standards, 45 (195), pp [12] LSMR software for linear systems and least squares. software.html. [13] D. G. Luenberger, Hyperbolic pairs in the method of conjugate gradients, SIAM J. Appl. Math., 17 (1969), pp [14], The conjugate residual method for constrained minimization problems, SIAM J. Numer. Anal., 7 (197), pp [15] G. A. Meurant, The Lanczos and Conjugate Gradient Algorithms: From Theory to Finite

14 14 DAVID FONG AND MICHAEL SAUNDERS Precision Computations, vol. 19 of Software, Environments, and Tools, SIAM, Philadelphia, 26. [16] C. C. Paige and M. A. Saunders, Solution of sparse indefinite systems of linear equations, SIAM J. Numer. Anal., 12 (1975), pp [17], LSQR: An algorithm for sparse linear equations and sparse least squares, ACM Trans. Math. Softw., 8 (1982), pp [18], Algorithm 583; LSQR: Sparse linear equations and least-squares problems, ACM Trans. Math. Softw., 8 (1982), pp [19] T. Steihaug, The conjugate gradient method and trust regions in large scale optimization, SIAM J. Numer. Anal., 2 (1983), pp [2] G. W. Stewart, Research, development and LINPACK, in Mathematical Software III, J. R. Rice, ed., Academic Press, New York, 1977, pp [21] E. Stiefel, Relaxationsmethoden bester strategie zur lösung linearer gleichungssysteme, Comm. Math. Helv., 29 (1955), pp [22] Yong Sun, The Filter Algorithm for Solving Large-Scale Eigenproblems from Accelerator Simulations, PhD thesis, Stanford University, Stanford, CA, March 23. [23] David Titley-Peloquin, Backward Perturbation Analysis of Least Squares Problems, PhD thesis, School of Computer Science, McGill University, 21. [24] Henk A. van der Vorst, Iterative Krylov Methods for Large Linear Systems, Cambridge University Press, Cambridge, first ed., 23. [25] David S. Watkins, Fundamentals of Matrix Computations, Pure and Applied Mathematics, Wiley, Hoboken, NJ, third ed., 21.

CG VERSUS MINRES: AN EMPIRICAL COMPARISON

CG VERSUS MINRES: AN EMPIRICAL COMPARISON VERSUS : AN EMPIRICAL COMPARISON DAVID CHIN-LUNG FONG AND MICHAEL SAUNDERS Abstract. For iterative solution of symmetric systems Ax = b, the conjugate gradient method () is commonly used when A is positive