ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

Similar documents
Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

CLASSICAL ITERATIVE METHODS

The Conjugate Gradient Method

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

EECS 275 Matrix Computation

Chapter 7 Iterative Techniques in Matrix Algebra

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Lab 1: Iterative Methods for Solving Linear Systems

Linear Solvers. Andrew Hazel

Iterative Methods for Solving A x = b

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

Notes on PCG for Sparse Linear Systems

Course Notes: Week 1

Algebra C Numerical Linear Algebra Sample Exam Problems

4.6 Iterative Solvers for Linear Systems

9.1 Preconditioned Krylov Subspace Methods

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

AMS526: Numerical Analysis I (Numerical Linear Algebra)

FEM and sparse linear system solving

Krylov Space Solvers

The Conjugate Gradient Method

6.4 Krylov Subspaces and Conjugate Gradients

Preconditioning Techniques Analysis for CG Method

Course Notes: Week 4

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

Solving Sparse Linear Systems: Iterative methods

Solving Sparse Linear Systems: Iterative methods

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

M.A. Botchev. September 5, 2014

Iterative Methods for Sparse Linear Systems

Conjugate Gradient Method

Numerical Methods I Non-Square and Sparse Linear Systems

Finite-choice algorithm optimization in Conjugate Gradients

DELFT UNIVERSITY OF TECHNOLOGY

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

Notes on Some Methods for Solving Linear Systems

Lecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University

Iterative Methods for Linear Systems of Equations

CS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3

Iterative methods for positive definite linear systems with a complex shift

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17

ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH

1 Extrapolation: A Hint of Things to Come

Conjugate Gradient Method

Krylov Subspace Methods that Are Based on the Minimization of the Residual

7.2 Steepest Descent and Preconditioning

Computational Linear Algebra

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Computational Linear Algebra

Iterative solution methods and their rate of convergence

In order to solve the linear system KL M N when K is nonsymmetric, we can solve the equivalent system

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

A Chebyshev-based two-stage iterative method as an alternative to the direct solution of linear systems

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

1 Conjugate gradients

Numerical Methods for Large-Scale Nonlinear Systems

Math Introduction to Numerical Analysis - Class Notes. Fernando Guevara Vasquez. Version Date: January 17, 2012.

The Lanczos and conjugate gradient algorithms

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods

Linear Algebra Massoud Malek

WHEN studying distributed simulations of power systems,

Master Thesis Literature Study Presentation

Classical iterative methods for linear systems

Algorithms that use the Arnoldi Basis

Lecture Note 7: Iterative methods for solving linear systems. Xiaoqun Zhang Shanghai Jiao Tong University

(v, w) = arccos( < v, w >

10.6 ITERATIVE METHODS FOR DISCRETIZED LINEAR EQUATIONS

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

Introduction. Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods. Example: First Order Richardson. Strategy

Algebra Exam Topics. Updated August 2017

On Solving Large Algebraic. Riccati Matrix Equations

Steady-State Optimization Lecture 1: A Brief Review on Numerical Linear Algebra Methods

Some minimization problems

Incomplete Cholesky preconditioners that exploit the low-rank property

(v, w) = arccos( < v, w >

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

Linear Algebra problems

Combination Preconditioning of saddle-point systems for positive definiteness

Conjugate Gradient Tutorial

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

Computational Linear Algebra

MATH 20F: LINEAR ALGEBRA LECTURE B00 (T. KEMP)

On the choice of abstract projection vectors for second level preconditioners

(v, w) = arccos( < v, w >

INTRODUCTION TO MULTIGRID METHODS

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Conjugate Gradient algorithm. Storage: fixed, independent of number of steps.

Numerical Solution I

Linear algebra issues in Interior Point methods for bound-constrained least-squares problems

DIAGONALIZATION. In order to see the implications of this definition, let us consider the following example Example 1. Consider the matrix

Preconditioned inverse iteration and shift-invert Arnoldi method

Topic 1: Matrix diagonalization

Iterative Linear Solvers

Iterative methods for Linear System

Transcription:

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES LONG CHEN We shall present iterative methods for solving linear algebraic equation Au = b based on Krylov subspaces We derive conjugate gradient (CG) method developed by Hestenes and Stiefel in 1950s [1] for symmetric and positive definite matrix A and briefly mention the GMRES method [3] for general non-symmetric matrix systems 1 CONJUGATE GRADIENT METHOD In this section, we introduce the conjugate gradient (CG) method for solving Au = b with a symmetric and positive definite operator A defined on a Hilbert space V with dim V = N and its preconditioned version PCG We derive the CG from the projection in A-inner product point of view We shall use (, ) for the standard inner product and (, ) A for the inner product introduced by the SPD operator A When we say orthogonal vectors, we refer to the default (, ) inner product and emphasize the orthogonality in (, ) A by A-orthogonal 11 Gradient Method Recall that the simplest iteration method, Richardson method, is in the form u k+1 = u k + α r k, where α is a fixed constant The correction α r k can be thought of as an approximation to the residual equation Ae k = r k Now we solve the residual equation Ae k = r k in the 1-dimensional subspace spanned by r k, ie, we look for e k = α kr k such that Test with r k, we get A(α k r k ) = r k, in span{r k } (1) α k = (r k, r k ) (Ar k, r k ) = (u u k, r k ) A (r k, r k ) A The iterative method u k+1 = u k + α k r k, is called gradient method Since α k is a nonlinear function of u k, the method becomes nonlinear The second identity in (1) implies that the correction e k = α kr k is the projection of the error e k = u u k onto the space span{r k } in the A-inner product From that, we immediately get the A-orthogonality (2) (e k e k, r k ) A = 0, which can be rewritten as (3) (u u k+1, r k ) A = (r k+1, r k ) = 0 Namely the residual in the consecutive step is orthogonal Date: March 6, 2016 1

2 LONG CHEN Remark 11 The name gradient method comes from the minimization point of view Let E(u) = u 2 A /2 (b, u) and consider the optimization problem min u V E(u) Given a current approximation u k, the decent direction is the negative gradient E(u k ) = b Au k = r k The optimal parameter α k is obtained by considering the one-dimensional minimization problem min α E(u k + αr k ) The residual is simply the (negative) gradient of the energy We then give convergence analysis of the gradient method to show that it converges as the optimal Richardson method Recall that in the Richardson method, the optimal choice α = 2/(λ min (A) + λ max (A)) requires the information of eigenvalues of A, while in the gradient method, α k is computed using the action of A only The optimality is build in through the optimization of the step size (so-called the exact line search) Theorem 12 Let A be SPD and let u k be the kth iteration in the gradient method with an initial guess u 0 We have ( ) k κ(a) 1 (4) u u k A u u 0 A, κ(a) + 1 where κ(a) = λ max(a) /λ min(a) is the condition number of A Proof Since we solve the residual equation in the subspace span{r k } exactly, or in other words, α k r k is the projection of u u k in the A-inner product, we have u u k+1 A = e k e k = inf α R e k αr k A = inf α R (I αa)(u u k) A Since I αa is symmetric, I αa A = ρ(i αa) Consequently u u k+1 A inf α R ρ(i αa) u u k A = κ(a) 1 κ(a) + 1 u u k A The proof is completed by recursion on k The level set of x 2 A is an ellipsoid and the condition number of A is the ratio of major and minor axis of the ellipsoid When κ(a) 1, the ellipsoid is very skinny and the gradient method will give a long zig-zag path towards the solution 12 Conjugate Gradient Method By tracing back to the initial guess u 0, the k + 1-th approximation of the gradient method can be written as Let u k+1 = u 0 + α 0 r 0 + α 1 r 1 + + α k r k V k = span{r 0, r 1,, r k } The correction k i=0 α ir i can be thought as an approximate solution of the residual equation Ae 0 = r 0, in V k by computing the coefficients along each basis {r 0, r 1,, r k } It is no longer the best approximation of e 0 = u u 0 in the subspace V k since the basis {r 0, r 1,, r k } is not A-orthogonal If we can find an A-orthogonal basis {p 0, p 1,, p k } of V k, ie, (p i, p j ) A = 0 if i j, the best approximation, ie, the A-projection P k e 0 V k can be found by the formulae P k e 0 = k α i p i, i=0 α i = (e 0, p i ) A (p i, p i ) A, for i = 0, k

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES 3 Let u k = u 0 + k 1 i=0 α ip i for k 1, then u k+1 can be computed recursively by The residual u k+1 = u k + α k p k, α k = (u u 0, p k ) A (p k, p k ) A r k+1 = b Au k+1 = b Au k α k Ap k = r k α k Ap k can be also updated recursively The vector p k is called the conjugate direction which is better than gradient direction since it is now A-orthogonal to all previous directions while the gradient direction is only orthogonal (not even A-orthogonal) to the previous one The questions are: How to find such an A-orthogonal basis? Which A-orthogonal basis is better in favor of storage and computation? The answer to the first question is easy Gram-Schmidt process We begin with p 0 = r 0 Update u 1 = u 0 + α 0 p 0 and compute the new residual r 1 = b Au 1 or update r 1 = r 0 α 0 Ap 0 If r 1 = 0, we obtain the exact solution and stop Otherwise we have two vectors {p 0, r 1 } and apply the Gram-Schmidt process to obtain p 1 = r 1 + β 0 p 0, β 0 = (r 1, p 0 ) A (p 0, p 0 ) A, which is A-orthogonal to p 0 We can then update u 2 = u 1 + α 1 p 1 and continue In general, suppose we have found an A-orthogonal basis {p 0, p 1,, p k } for V k We update the approximation u k+1 = u k +α k p k and update the residual r k+1 = r k α k Ap k Again we apply the Gram-Schmidt process to look for the next conjugate direction p k+1 = r k+1 + β k p k + γ k 1 p k 1 + + γ 0 p 0 The coefficients can be computed by the A-orthogonality (p k+1, p i ) A = 0 for 0 i k and the formula for γ i is γ i = (r k+1, p i ) A /(p i, p i ) A If we need to compute all these coefficients, we need to store all previous basis vectors This is not efficient in both time and space The amazing fact is that all γ i, 0 i k 1, are zero We can simply compute the new conjugate direction as (5) p k+1 = r k+1 + β k p k β k = (r k+1, p k ) A (p k, p k ) A The formulae (5) is justified by the following orthogonality Lemma 13 The residual r k+1 is (, )-orthogonal to V k and A-orthogonal to V k 1 Proof Since {p 0, p 1,, p k } is a A-orthogonal basis of V k As we mentioned before, the formulae u k+1 u 0 = k i=0 α ip i actually computes the A-projection of the error ie, u k+1 u 0 = P k (u u 0 ) We write u u k+1 = u u 0 (u k+1 u 0 ) = (I P k )(u u 0 ) Therefore u u k+1 A V k, ie r k+1 = A(u u k+1 ) V k The first orthogonality is proved By the update formula for the residual r i = r i 1 α i 1 Ap i 1, we get Ap i 1 span{r i 1, r i } V k for i k Since we have proved r k+1 V k, we get (r k+1, p i ) A = (r k+1, Ap i ) = 0 for i k 1, ie r k+1 is A-orthogonal to V k 1 This lemma shows the advantange of the conjugate gradient method over the gradient method The new residual is orthogonal to the whole space not only to one residual vector in the previous step The additional orthogonality reduces the Gram-Schmidt process to three-term recursion

4 LONG CHEN In summary, starting from an initial guess u 0 and p 0 = r 0, for k = 0, 1, 2,, we use three recursion formulae to compute u k+1 = u k + α k p k, r k+1 = r k α k Ap k, α k = (u u 0, p k ) A (p k, p k ) A, p k+1 = r k+1 + β k p k, β k = (r k+1, p k ) A (p k, p k ) A The method is called conjugate gradient (CG) method since the conjugate direction is obtained by a correction of the gradient (the residual) direction The importance of the gradient direction is the orthogonality given in Lemma 13 We now derive more efficient formulae for α k and β k Proposition 14 α k = (r k, r k ) (Ap k, p k ), β k = (r k+1, r k+1 ) (r k, r k ) Proof First since r k = r 0 k 1 i=0 α iap i and (p k, Ap i ) = 0 for 0 i k 1, we have (u u 0, p k ) A = (r 0, p k ) = (r k, p k ) Then since p k = r k + β k 1 p k 1 and (r k, p k 1 ) = 0, we have (r k, p k ) = (r k, r k ) Recall that α k is the coefficient of u u 0 corresponding to the basis p k, ie, α k = (u u 0, p k ) A (p k, p k ) A = (r k, r k ) (Ap k, p k ) Second, to prove the formula of β k, we use the recursion formula for the residual to get (r k+1, Ap k ) = 1 α k (r k+1, r k+1 ) Then using the formula we just proved for α k, we replace (Ap k, p k ) = 1 α k (r k, r k ), to get β k = (r k+1, p k ) A = (r k+1, Ap k ) (p k, p k ) A (Ap k, p k ) = (r k+1, r k+1 ) (r k, r k ) We present the algorithm of the conjugate gradient method as follows 1 function u = CG(A,b,u,tol) 2 tol = tol*norm(b); 3 k = 1; 4 r = b - A*u; 5 p = r; 6 r2 = r *r; 7 while sqrt(r2) tol && k<length(b) 8 Ap = A*p; 9 alpha = r2/(p *Ap); 10 u = u + alpha*p; 11 r = r - alpha*ap;

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES 5 12 r2old = r2; 13 r2 = r *r; 14 beta = r2/r2old; 15 p = r + beta*p; 16 k = k + 1; 17 end Several remarks on the realization are listed below: The most time consuming part is the matrix-vector multiplication A*p The error is measured by the relative error of the residual r tol b A maximum iteration step is given to avoid a large loop 13 Convergence Analysis of Conjugate Gradient Method Recall that An important observation is that Lemma 15 V k = span{r 0, r 1,, r k } = span{p 0, p 1,, p k } V k = span{r 0, Ar 0,, A k r 0 } Proof We prove it by induction For k = 0, it is trivial Suppose the statement holds for k = i We prove it holds for k = i + 1 by noting that V i+1 = V i + span{r i+1 } = V i + span{r i, Ap i } = V i + span{ap i } = V i + AV i The space span{r 0, Ar 0,, A k r 0 } is called Krylov subspace The CG method belongs to a large class of Krylov subspace iterative methods By the construction, we have u k = u 0 + P k 1 (u u 0 ) Writing u u k = u u 0 (u k u 0 ), we have (6) u u k A = inf v V k 1 u u 0 + v A Theorem 16 Let A be SPD and let u k be the kth iteration in the CG method with an initial guess u 0 Then (7) (8) u u k A = u u k A inf p k (A)(u u 0 ) A, p k P k, p k (0)=1 inf p k P k, sup λ σ(a) p k (0)=1 Proof For v V k 1, it can be expanded as k 1 v = c i A i r 0 = i=0 Let p k (x) = 1+ k i=1 c i 1x i Then p k (λ) u u 0 A k c i 1 A i (u u 0 ) i=1 u u 0 + v = p k (A)(u u 0 ) The identity (7) then follows from (6) Since A i is symmetric in the A-inner product, we have p k (A) A = ρ(p k (A)) = sup p k (λ), λ σ(a)

6 LONG CHEN which leads to the estimate (8) The polynomial p k P k with constrain p k (0) = 1 will be called the residual polynomial Various convergence results of CG method can be obtained by choosing specific residual polynomials Corollary 17 Let A be SPD with eigenvectors {φ} N i=1 Let b span{φ i 1, φ i2,, φ ik }, k N Then the CG method with u 0 = 0 will find the solution Au = b in at most k iterations In particular, for a given b R N, the CG algorithm will find the solution Au = b within N iterations Namely CG can be viewed as a direct method Proof Choose the polynomial p(x) = k l=1 (1 x/λ i l ) Then p is a residual polynomial of order k and p(0) = 1, p(λ il ) = 0 Let b = k l=1 b lφ il Then u = k l=1 u lφ il with u l = b l /λ il and k k p(a) u l φ il = u l p(λ il )φ il = 0 l=1 The assertion is then from the identity (7) l=1 Remark 18 The CG method can be also applied to symmetric and positive semi-definite matrix A Let {φ i } k i=1 be the eigenvectors associated to λ min(a) = 0 Then from Corollary 17, if b range(a) = span{φ k+1, φ k+2,, φ N }, then the CG method with u 0 range(a) will find a solution Au = b within N k 1 iterations The CG is invented as a direct method but it is more effective to use as an iterative method The rate of convergence depends crucially on the distribution of eigenvalues of A and could converge to the solution within certain tolerance in steps k N Theorem 19 Let u k be the k-th iteration of the CG method with u 0 Then ( ) k κ(a) 1 (9) u u k A 2 u u 0 A, κ(a) + 1 Proof Let b = λ max (A) and a = λ min (A) Choose p k (x) = T k((b + a 2x)/(b a)), T k ((b + a)/(b a)) where T k (x) is the Chebyshev polynomial of degree k given by { cos(k arccos x) if x 1 T k (x) = cosh(k arccosh x) if x 1 It is easy to see that p k (0) = 1, and if x [a, b], then b + a 2x b a 1 Hence We set inf p k P k, sup λ σ(a) p k (0)=1 p k (λ) [ ( )] 1 b + a T k b a b + a b a = coshσ = eσ + e σ 2

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES 7 Solving this equation for e σ, we have κ(a) + 1 e σ = κ(a) 1 with κ(a) = b/a We then obtain ( ) b + a T k = cosh(kσ) = ekσ + e kσ 1 b a 2 2 ekσ = 1 2 which complete the proof ( κ(a) + 1 κ(a) 1 ) k, The estimate (9) shows that CG is in general better than the gradient method Furthermore if the condition number of A is close to one, the CG iteration will converge very fast Even if κ(a) is large, the iteration will perform well if the eigenvalues are clustered in a few small intervals; see the following estimate Corollary 110 Assume that σ(a) = σ 0 (A) σ 1 (A) and l is the number of elements in σ 0 (A) Then ( ) k l b/a 1 (10) u u k A 2M u u 0 A, b/a + 1 where a = min λ, b = max λ, and M = max λ σ 1(A) λ σ 1(A) λ σ 1(A) µ σ 0(A) 1 λ/µ 14 Preconditioned Conjugate Gradient (PCG) method Let B be an SPD matrix We shall apply the Gram-Schmidt process to the subspace V k = span{br 0, Br 1,, Br k }, to get another A-orthogonal basis That is, starting from an initial guess u 0 and p 0 = Br 0, for k = 0, 1, 2,, we use three recursion formulae to compute u k+1 = u k + α k p k, α k = (Br k, r k ) (Ap k, p k ), r k+1 = r k α k Ap k, p k+1 = Br k+1 + β k p k, β k = (Br k+1, r k+1 ) (Br k, r k ) We need to justify p k+1 computed above is A-orthogonal to V k Lemma 111 Br k+1 is A-orthogonal to V k 1 and p k+1 is is A-orthogonal to V k Exercise 112 Prove Lemma 111 We present the algorithm of preconditioned conjugate gradient method as follows 1 function u = pcg(a,b,u,b,tol) 2 tol = tol*norm(b); 3 k = 1; 4 r = b - A*u; 5 rho = 1; 6 while sqrt(rho) tol && k<length(b) 7 Br = B*r; 8 rho = r *Br;

8 LONG CHEN 9 if k = 1 10 p = Br; 11 else 12 beta = rho/rho_old; 13 p = Br + beta*p; 14 end 15 Ap = A*p; 16 alpha = rho/(p *Ap); 17 u = u + alpha*p; 18 r = r - alpha*ap; 19 rho_old = rho; 20 k = k + 1; 21 end Similarly we collect several remarks for the implementation If we think A : V V, the residual is in the dual space V B is the Reisz representation from V to V of some inner product In the original CG, B is the identity matrix A general guiden principle of design a good principle is to find the correct inner product such that the operator A is continuous and stable For an effective PCG, the computation B*r is crucial The matrix B do not have to be formed explicitly All we need is the matrix-vector multiplication which can be replaced by a subroutine To perform the convergence analysis, we let r 0 = Br 0 Then one can easily verify that V k = span{ r 0, BA r 0,, (BA) k r 0 } Since the optimality u u k A = inf u u 0 + v A v V k 1 still holds, and BA is symmetric in the A-inner product, we obtain the following convergence rate of PCG Theorem 113 Let A be SPD and let u k be the kth iteration in the PCG method with the preconditioner B and an initial guess u 0 Then (11) (12) u u k A = u u k A inf p k (BA)(u u 0 ) A, p k P k, p k (0)=1 inf p k P k, sup λ σ(ba) p k (0)=1 p k (λ) u u 0 A, (13) u u k A 2 ( κ(ba) 1 κ(ba) + 1 ) k u u 0 A The SPD matrix B is called preconditioner A good preconditioner should have the properties that the action of B is easy to compute and that κ(ba) is significantly smaller than κ(a) The design of a good preconditioner requires the knowledge of the problem and finding a good preconditioner is a central topic of scientific computing For a linear iterative method Φ B with a symmetric iterator B, the iterator can be used as a preconditioner for A The corresponding PCG method converges at a faster rate Let ρ denote the convergence rate of Φ B, ie ρ = ρ(i BA) Then κ(ba) 1 δ = 1 1 ρ 2 < ρ κ(ba) + 1 ρ

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES 9 A more interesting fact is that the scheme Φ B may not be convergent at all whereas B can always be a preconditioner That is even ρ > 1, the PCG rate δ < 1 For example, the Jacobi method is not convergent for all SPD systems, but B = D 1 can always be used as a preconditioner, which is often known as the diagonal preconditioner Another popular preconditioner is an incomplete Cholesky factorization 2 GMRES For a general matrix A, we can reformulated the equation Au = b as a least square problem min b Au, u R N and restrict the minimization in the affine subspace u 0 + V k, ie, (14) min v u 0+V k b Av Let u k be the solution of (14) and r k = b Au k Identical to the proof of CG, we immediately get (15) r k = min p(a)r 0 min p(a) r 0 p P k,p(0)=1 p P k,p(0)=1 Unlike the case when A is SPD, now the norm p(a) is not easy to estimate since the spectrum of a general matrix A includes complex numbers and much harder to estimate Convergence analysis for diagonalizable matrices can be found in [2] In the algorithmic level, the GMRES is also more expensive than CG There are no recursive formula to update the approximation One has to (1) Compute and store an orthogonal basis of V k ; (2) Solve the least square problem (14) The k orthogonal basis can be computed by Gram-Schmidt process requiring O(kN) memory and complexity The least square problem can be solved in O(k 3 ) complexity Plus one matrix-vector product with O(sN) operations, where s is the sparsity of the matrix (average number of non-zeros in one row) Therefore for m GMRES iterations, the total complexity is O(smN + m 2 N + m 4 ) As the iteration progresses, the term m 2 N + m 4 increases quickly In addition, it requires mn memory to store m orthogonal basis GMRES won not be efficient for large m One simple fix is the restart Choose a upper bound for m, say, 20 After that, discard the previous basis and record new basis The restart will lose the optimality (15) and in turn slow down the convergence since in (14) the space is changed A good preconditioner is thus more important for GMRES Suppose we can choose a preconditioner B such that I BA = ρ < 1 Then solving BA u = Bb by GMRES The estimate (15) will lead to the convergence Br k ρ k Br 0 One can also solve AB v = b by GMRES if I AB = ρ < 1 and set u = Bv These two are called left or right preconditioner, respectively Again a general guiding principle of design a good principle is to find the correct inner product such that the operator A is continuous and stable and use the corresponding Reisz representation as the preconditioner

10 LONG CHEN REFERENCES [1] M Hestenes and E Stiefel Methods of conjugate gradients for solving linear systems Journal cf Research of the National Bureau of Standards Vol, 49(6), 1952 [2] C Kelley Iterative methods for linear and nonlinear equations Society for Industrial Mathematics, 1995 [3] Y Saad and M H Schultz GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems SIAM J Sci Statist Comput, 7:856 869, 1986