A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

Similar documents
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

IDR(s) as a projection method

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

LARGE SPARSE EIGENVALUE PROBLEMS

Preface to the Second Edition. Preface to the First Edition

LARGE SPARSE EIGENVALUE PROBLEMS. General Tools for Solving Large Eigen-Problems

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

FEM and sparse linear system solving

A DISSERTATION. Extensions of the Conjugate Residual Method. by Tomohiro Sogabe. Presented to

Iterative Methods for Linear Systems of Equations

Algorithms that use the Arnoldi Basis

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

The Lanczos and conjugate gradient algorithms

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

M.A. Botchev. September 5, 2014

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

INTRODUCTION TO SPARSE MATRICES

GMRES: Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems

Iterative Methods for Sparse Linear Systems

KRYLOV SUBSPACE ITERATION

Iterative methods for Linear System

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Krylov Subspace Methods that Are Based on the Minimization of the Residual

On the influence of eigenvalues on Bi-CG residual norms

PROJECTED GMRES AND ITS VARIANTS

Krylov Space Solvers

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Recent computational developments in Krylov Subspace Methods for linear systems. Valeria Simoncini and Daniel B. Szyld

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Iterative Methods for Sparse Linear Systems

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17

In order to solve the linear system KL M N when K is nonsymmetric, we can solve the equivalent system

9.1 Preconditioned Krylov Subspace Methods

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

Charles University Faculty of Mathematics and Physics DOCTORAL THESIS. Krylov subspace approximations in linear algebraic problems

Computational Linear Algebra

Algebra C Numerical Linear Algebra Sample Exam Problems

6.4 Krylov Subspaces and Conjugate Gradients

Research Article Some Generalizations and Modifications of Iterative Methods for Solving Large Sparse Symmetric Indefinite Linear Systems

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

DELFT UNIVERSITY OF TECHNOLOGY

Computational Linear Algebra

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

A stable variant of Simpler GMRES and GCR

Solving Sparse Linear Systems: Iterative methods

Solving Sparse Linear Systems: Iterative methods

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

RESIDUAL SMOOTHING AND PEAK/PLATEAU BEHAVIOR IN KRYLOV SUBSPACE METHODS

Lecture 8 Fast Iterative Methods for linear systems of equations

Lecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University

Block Bidiagonal Decomposition and Least Squares Problems

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Fast iterative solvers

DELFT UNIVERSITY OF TECHNOLOGY

Linear Algebra. Brigitte Bidégaray-Fesquet. MSIAM, September Univ. Grenoble Alpes, Laboratoire Jean Kuntzmann, Grenoble.

Numerical Methods - Numerical Linear Algebra

The quadratic eigenvalue problem (QEP) is to find scalars λ and nonzero vectors u satisfying

Preconditioned Conjugate Gradient-Like Methods for. Nonsymmetric Linear Systems 1. Ulrike Meier Yang 2. July 19, 1994

Chapter 7 Iterative Techniques in Matrix Algebra

Linear Solvers. Andrew Hazel

Numerical Methods for Large-Scale Nonlinear Systems

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Stabilization and Acceleration of Algebraic Multigrid Method

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

Krylov subspace projection methods

On Solving Large Algebraic. Riccati Matrix Equations

Course Notes: Week 1

ETNA Kent State University

The Conjugate Gradient Method

On prescribing Ritz values and GMRES residual norms generated by Arnoldi processes

ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH

AMS526: Numerical Analysis I (Numerical Linear Algebra)

ECS231 Handout Subspace projection methods for Solving Large-Scale Eigenvalue Problems. Part I: Review of basic theory of eigenvalue problems

Eigenvalue Problems. Eigenvalue problems occur in many areas of science and engineering, such as structural analysis

Numerical Methods in Matrix Computations

Contents. Preface... xi. Introduction...

Scientific Computing: An Introductory Survey

Projection-based iterative methods. p. 1/49

The Conjugate Gradient Method

Arnoldi Methods in SLEPc

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

Least-Squares Systems and The QR factorization

Math 407: Linear Optimization

Lecture 8: Fast Linear Solvers (Part 7)

Eigenvalue Problems CHAPTER 1 : PRELIMINARIES

Recycling Bi-Lanczos Algorithms: BiCG, CGS, and BiCGSTAB

PETROV-GALERKIN METHODS

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

Simple iteration procedure

ON THE GLOBAL KRYLOV SUBSPACE METHODS FOR SOLVING GENERAL COUPLED MATRIX EQUATIONS

1 Conjugate gradients

Introduction to Iterative Solvers of Linear Systems

On the Preconditioning of the Block Tridiagonal Linear System of Equations

Approximating the matrix exponential of an advection-diffusion operator using the incomplete orthogonalization method

AN ITERATIVE METHOD WITH ERROR ESTIMATORS

5.6. PSEUDOINVERSES 101. A H w.

Transcription:

A short course on: Preconditioned Krylov subspace methods Yousef Saad University of Minnesota Dept. of Computer Science and Engineering Universite du Littoral, Jan 19-3, 25 Outline Part 1 Introd., discretization of PDEs Sparse matrices and sparsity Basic iterative methods (Relaxation..) Part 2 Projection methods Krylov subspace methods Part 3 Preconditioned iterations Preconditioning techniques Parallel implementations Part 4 Eigenvalue problems Applications Calais February 7, 25 2 THE PROBLEM We consider the linear system Ax = b PROJECTION METHODS FOR LINEAR SYSTEMS where A is N N and can be Real symmetric positive definite Real nonsymmetric Complex Focus: A is large and sparse, possibly with an irregular structure Calais February 7, 25 4

PROJECTION METHODS Initial Problem: b Ax = Given two subspaces K and L of R N de ne the approximate problem: Find x K such that b A x L Leads to a small linear system ( projected problems ) This is a basic projection step. Typically: sequence of such steps are applied With a nonzero initial guess x, the approximate problem is Find x x + K such that b A x L Write x = x + δ and r = b Ax. Leads to a system for δ: Find δ K such that r Aδ L Matrix representation: V = [v 1,..., v m ] a basis of K & Let W = [w 1,..., w m ] a basis of L Then letting x be the approximate solution x = x + δ x + V y where y is a vector of R m, the Petrov-Galerkin condition yields, W T (r AV y) = and therefore x = x + V [W T AV ] 1 W T r Remark: In practice W T AV is known from algorithm and has a simple structure [tridiagonal, Hessenberg,..] Calais February 7, 25 5 Calais February 7, 25 6 PROTOTYPE PROJECTION METHOD OPERATOR FORM REPRESENTATION Until Convergence Do: 1. Select a pair of subspaces K, and L; 2. Choose bases V = [v 1,..., v m ] for K and W = [w 1,..., w m ] for L. 3. Compute r b Ax, y (W T AV ) 1 W T r, x x + V y. Let P be the orthogonal projector onto K and Q the (oblique) projector onto K and orthogonally to L. Px K, x Px K Qx K, x Qx L Qx L The P and Q projectors x Px K Calais February 7, 25 7 Calais February 7, 25 8

Approximate problem amounts to solving Q(b Ax) =, x K or in operator form Q(b APx) = Question: what accuracy can one expect? Let x be the exact solution. Then 1) we cannot get better accuracy than (I P)x 2, i.e., Two Important Particular Cases. 1. L = AK. Then b A x 2 = min z K b Az 2 Class of Minimal Residual Methods: CR, GCR, ORTHOMIN, GMRES, CGNR,... 2. L = K Class of Galerkin or Orthogonal projection methods. When A is SPD then x x A = min z K x z A. x x 2 (I P)x 2 2) the residual of the exact solution for the approximate problem satisfies: b QAPx 2 QA(I P) 2 (I P)x 2 Calais February 7, 25 9 Calais February 7, 25 1 One-dimensional projection processes K = span{d} and L = span{e} Then x x + αd and Petrov-Galerkin condition r Aδ e yields Three popular choices: α = (r,e) (Ad,e) (I) Steepest descent. A is SPD. Take at each step d = r and e = r. Iteration: r b Ax, α (r, r)/(ar, r) x x + αr Each step minimizes f(x) = x x 2 A = (A(x x ), (x x )) in direction f. Convergence guaranteed if A is SPD. Calais February 7, 25 11 (II) Residual norm steepest descent. A is arbitrary (nonsingular). Take at each step d = A T r and e = Ad. Iteration: r b Ax, d = A T r α d 2 2 / Ad 2 2 x x + αd Each step minimizes f(x) = b Ax 2 2 in direction f. Important Note: equivalent to usual steepest descent applied to normal equations A T Ax = A T b. Converges under the condition that A is nonsingular. Calais February 7, 25 12

(III) Minimal residual iteration. A positive definite (A + A T is SPD). Take at each step d = r and e = Ar. KRYLOV SUBSPACE METHODS Iteration: r b Ax, α (Ar, r)/(ar, Ar) x x + αr Principle: Projection methods on Krylov subspaces, i.e., on K m (A, v 1 ) = span{v 1, Av 1,, A m 1 v 1 } Each step minimizes f(x) = b Ax 2 2 in direction r. Converges under the condition that A + A T is SPD. probably the most important class of iterative methods. many variants exist depending on the subspace L. v Simple properties of K m. Let µ = deg. of minimal polynomial of K m = {p(a)v p = polynomial of degree m 1} K m = K µ for all m µ. Moreover, K µ is invariant under A. Calais February 7, 25 13 dim(k m ) = m iff µ m. Calais February 7, 25 14 A little review: Gram-Schmidt process Goal: given X = [x 1,..., x m ] compute an orthonormal set Q = [q 1,..., q m ] which spans the same susbpace. ALGORITHM : 1 1. For j = 1,..., m Do: Classical Gram-Schmidt 2. Compute r ij = (x j, q i ) for i = 1,..., j 1 3. Compute ˆq j = x j j 1 i=1 r ij q i 4. r jj = ˆq j 2 If r jj == exit ALGORITHM : 2 Modified Gram-Schmidt 1. For j = 1,..., m Do: 2. ˆq j := x j 3. For i = 1,..., j 1 Do 4. r ij = (ˆq j, q i ) 5. ˆq j := ˆq j r ij q i 6. EndDo 7. r jj = ˆq j 2. If r jj == exit 8. q j := ˆq j /r jj 9. EndDo 5. q j = ˆq j /r jj 6. EndDo Calais February 7, 25 15 Calais February 7, 25 16

Let: X = [x 1,..., x m ] (n m matrix) Q = [q 1,..., q m ] (n m matrix) R = {r ij } (m m upper triangular matrix) At each step, ARNOLDI S ALGORITHM Goal: to compute an orthogonal basis of K m. Input: Initial vector v 1, with v 1 2 = 1 and m. For j = 1,..., m do Result: x j = j i=1 X = QR r ij q i Compute w := Av j for i = 1,..., j, do h i,j := (w, v i ) w := w h i,j v i h j+1,j = w 2 and v j+1 = w/h j+1,j Calais February 7, 25 17 Calais February 7, 25 18 Arnoldi s Method (L m = K m ) From Petrov-Galerkin condition when L m = K m, we get Vm Hm = O x m = x + V m H 1 m V T m r Result of orthogonalization process (Arnoldi s algorithm:) 1. V m = [v 1, v 2,..., v m ] orthonormal basis of K m. 2. AV m = V m+1 H m 3. Vm T AV m = H m H m last row. If, in addition we choose v 1 = r / r 2 r /β in Arnoldi s algorithm, then x m = x + βv m H 1 m e 1 Several algorithms mathematically equivalent to this approach: * FOM [Saad, 1981] (above formulation) * Young and Jea s ORTHORES [1982]. * Axelsson s projection method [1981]. Calais February 7, 25 2

Minimal residual methods (L m = AK m ) Restarting and Truncating When L m = AK m, we let W m AV m and obtain relation x m = x +V m [W T m AV m] 1 W T m r = x +V m [(AV m ) T AV m ] 1 (AV m ) T r. Use again v 1 := r /(β := r 2 ) and the relation AV m = V m+1 H m: x m = x + V m [ H T H m m ] 1 H T m βe 1 = x + V m y m where y m minimizes βe 1 H m y 2 over y R m. Therefore, (Generalized Minimal Residual method (GMRES) [Saad-Schultz, 1983]): x m = x + V m y m where y m : min y βe 1 H m y 2 Axelsson s CGLS Orthomin (198) Equivalent methods: Orthodir GCR Calais February 7, 25 21 Difficulty: As m increases, storage and work per step increase fast. First remedy: Restarting. Fix the dimension m of the subspace ALGORITHM : 3 Restarted GMRES (resp. Arnoldi) 1. Start/Restart: Compute r = b Ax, and v 1 = r /(β := r 2 ). 2. Arnoldi Process: generate H m and V m. 3. Compute y m = H 1 m βe 1 (FOM), or 4. x m = x + V m y m y m = argmin βe 1 H m y 2 (GMRES) 5. If r m 2 ɛ r 2 stop else set x := x m and go to 1. Calais February 7, 25 22 Second remedy: Truncate the orthogonalization The direct version of IOM [DIOM]: The formula for v j+1 is replaced by h j+1,j v j+1 = Av j j i=j k+1 h ij v i Writing the LU decomposition of H m as H m = L m U m we get x m = x + V m U 1 m L 1 m βe 1 Structure of L m, U m when k = 3 x + P m z m each v j is made orthogonal to the previous k v i s. x m still computed as x m = x + V m H 1 m βe 1. It can be shown that this is again an oblique projection process. IOM (Incomplete Orthogonalization Method) = replace orthogonalization in FOM, by the above truncated (or incomplete ) orthogonalization. Calais February 7, 25 23 L m = 1 x 1 x 1 x 1 x 1 x 1 x 1 U m = x x x x x x x x x x x x x x x x x x

p m = u 1 mm [v m m 1 i=m k+1 u imp i ] z m = z m 1 ζ m Can update x m at each step: x m = x m 1 + ζ m p m Note: Several existing pairs of methods have a similar link: they are based on the LU, or other, factorizations of the H m matrix CG-like formulation of IOM called DIOM [Saad, 1982] ORTHORES(k) [Young & Jea 82] equivalent to DIOM(k) SYMMLQ [Paige and Saunders, 77] uses LQ factorization of H m. Can incorporate partial pivoting in LU factorization of H m Calais February 7, 25 25 Some implementation details: GMRES Issue 1 : how to solve least-squares problem? Issue 2: How to compute residual norm (without computing solution at each step)? Several solutions to both issues. Simplest: use Givens rotations. Recall: we want to solve least-squares problem min y βe 1 H m y 2 Transform the problem into upper triangular one. Calais February 7, 25 26 Rotation matrices of dimension m + 1. Define (with s 2 i + c2 i = 1): Ω i = 1... 1 c i s i s i c i 1... 1 row i row i + 1 Multiply H m and right-hand side ḡ βe 1 by a sequence of such matrices from the left. s i, c i selected to eliminate h i+1,i Calais February 7, 25 27 H 5 = h 11 h 12 h 13 h 14 h 15 h 21 h 22 h 23 h 24 h 25 h 32 h 33 h 34 h 35 h 43 h 44 h 45 h 54 h 55 h 65, ḡ = β. 1-st Rotation Ω 1 = c 1 s 1 s 1 c 1 1 1 1 with s 1 = h 21 h 2 11 + h 2 21, c 1 = h 11 h 2 11 + h 2 21 Calais February 7, 25 28

H (1) m = h (1) 11 h (1) 12 h (1) 13 h (1) h (1) 22 h (1) 23 h (1) 14 h (1) 15 24 h (1) 25 h 32 h 33 h 34 h 35 repeat with Ω 2,..., Ω i. Result: H (5) 5 = h 43 h 44 h 45 h 54 h 55 h (5) 11 h (5) 12 h (5) 13 h (5) h (5) 22 h (5) 23 h (5) h (5) 33 h (5) h 65 14 h (5) 15 24 h (5) 25 34 h (5) 35 h (5) 44 h (5) 45 h (5) 55, ḡ 1 =, ḡ 5 = c 1 β s 1 β γ 1 γ 2 γ 3.. γ 6.. Define Q m = Ω m Ω m 1... Ω 1 R m = Since Q m is unitary, H (m) m = Q m H m, ḡ m = Q m (βe 1 ) = (γ 1,..., γ m+1 ) T. min βe 1 H m y 2 = min ḡ m R m y 2. Delete last row and solve resulting triangular system. R m y m = g m Calais February 7, 25 3 PROPOSITION: 1. The rank of AV m is equal to the rank of R m. In particular, if r mm = then A must be singular. 2. The vector y m which minimizes βe 1 H m y 2 is given by y m = R 1 m g m. 3. The residual vector at step m satisfies b Ax m = V m+1 ( βe1 H m y m ) = Vm+1 Q T m (γ m+1e m+1 ) and, as a result, b Ax m 2 = γ m+1. THE SYMMETRIC CASE: Observation Observe: When A is real symmetric then in Arnoldi s method: H m = Vm T AV m must be symmetric. Therefore Theorem. When Arnoldi s algorithm is applied to a (real) symmetric matrix then the matrix H m is symmetric tridiagonal: h ij = 1 i < j 1; and h j,j+1 = h j+1,j, j = 1,..., m Calais February 7, 25 31 Calais February 7, 25 32

We can write H m = α 1 β 2 β 2 α 2 β 3 β 3 α 3 β 4...... β m α m The v i s satisfy a three-term recurrence [Lanczos Algorithm]: β j+1 v j+1 = Av j α j v j β j v j 1 simplified version of Arnoldi s algorithm for sym. systems. Symmetric matrix + Arnoldi Symmetric Lanczos (1) The Lanczos algorithm ALGORITHM : 4 Lanczos 1. Choose an initial vector v 1 of norm unity. Set β 1, v 2. For j = 1, 2,..., m Do: 3. w j := Av j β j v j 1 4. α j := (w j, v j ) 5. w j := w j α j v j 6. β j+1 := w j 2. If β j+1 = then Stop 7. v j+1 := w j /β j+1 8. EndDo Calais February 7, 25 33 Calais February 7, 25 34 Lanczos algorithm for linear systems Usual orthogonal projection method setting: L m = K m = span{r, Ar,..., A m 1 r } Basis V m = [v 1,..., v m ] of K m generated by the Lanczos algorithm Three different possible implementations. (1) Arnoldi-like; (2) Exploit tridigonal nature of H m (DIOM); (3) Conjugate gradient. ALGORITHM : 5 Lanczos Method for Linear Systems 1. Compute r = b Ax, β := r 2, and v 1 := r /β 2. For j = 1, 2,..., m Do: 3. w j = Av j β j v j 1 (If j = 1 set β 1 v ) 4. α j = (w j, v j ) 5. w j := w j α j v j 6. β j+1 = w j 2. If β j+1 = set m := j and go to 9 7. v j+1 = w j /β j+1 8. EndDo 9. Set T m = tridiag(β i, α i, β i+1 ), and V m = [v 1,..., v m ]. 1. Compute y m = T 1 m (βe 1) and x m = x + V m y m Calais February 7, 25 35 Calais February 7, 25 36

ALGORITHM : 6 D-Lanczos 1. Compute r = b Ax, ζ 1 := β := r 2, and v 1 := r /β 2. Set λ 1 = β 1 =, p = 3. For m = 1, 2,..., until convergence Do: 4. Compute w := Av m β m v m 1 and α m = (w, v m ) 5. If m > 1 then compute λ m = β m η m 1 and ζ m = λ m ζ m 1 The Conjugate Gradient Algorithm (A S.P.D.) Note: the p i s are A-orthogonal The r i s are orthogonal. And we have x m = x m 1 + ξ m p m 6. η m = α m λ m β m 7. p m = η 1 m (v m β m p m 1 ) 8. x m = x m 1 + ζ m p m 9. If x m has converged then Stop 1. w := w α m v m 11. β m+1 = w 2, v m+1 = w/β m+1 12. EndDo So there must be an update of the form: 1. p m = r m 1 + β m p m 1 2. x m = x m 1 + ξ m p m 3. r m = r m 1 ξ m Ap m Calais February 7, 25 37 Calais February 7, 25 38 The Conjugate Gradient Algorithm (A S.P.D.) 1. Start: r := b Ax, p := r. 2. Iterate: Until convergence do, (a) α j := (r j, r j )/(Ap j, p j ) METHODS BASED ON LANCZOS BIORTHOGONALIZATION (b) x j+1 := x j + α j p j (c) r j+1 := r j α j Ap j (d) β j := (r j+1, r j+1 )/(r j, r j ) (e) p j+1 := r j+1 + β j p j r j = scaling v j+1. The r j s are orthogonal. The p j s are A-conjugate, i.e., (Ap i, p j ) = for i j. Calais February 7, 25 39

ALGORITHM : 7 The Lanczos Bi-Orthogonalization Procedure 1. Choose two vectors v 1, w 1 such that (v 1, w 1 ) = 1. 2. Set β 1 = δ 1, w = v 3. For j = 1, 2,..., m Do: 4. α j = (Av j, w j ) 5. ˆv j+1 = Av j α j v j β j v j 1 6. ŵ j+1 = A T w j α j w j δ j w j 1 7. δ j+1 = (ˆv j+1, ŵ j+1 ) 1/2. If δ j+1 = Stop 8. β j+1 = (ˆv j+1, ŵ j+1 )/δ j+1 9. w j+1 = ŵ j+1 /β j+1 1. v j+1 = ˆv j+1 /δ j+1 11. EndDo Extension of the symmetric Lanczos algorithm Builds a pair of biorthogonal bases for the two subspaces K m (A, v 1 ) and K m (A T, w 1 ) Different ways to choose δ j+1, β j+1 in lines 7 and 8. Let T m = α 1 β 2 δ 2 α 2 β 3... δ m 1 α m 1 β m v i K m (A, v 1 ) and w j K m (A T, w 1 ). δ m α m. Calais February 7, 25 41 If the algorithm does not break down before step m, then the vectors v i, i = 1,..., m, and w j, j = 1,..., m, are biorthogonal, i.e., (v j, w i ) = δ ij 1 i, j m. Moreover, {v i } i=1,2,...,m is a basis of K m (A, v 1 ) and {w i } i=1,2,...,m is a basis of K m (A T, w 1 ) and AV m = V m T m + δ m+1 v m+1 e T m, A T W m = W m T T m + β m+1w m+1 e T m, W T m AV m = T m. The Lanczos Algorithm for Linear Systems ALGORITHM : 8 Lanczos Algorithm for Linear Systems 1. Compute r = b Ax and β := r 2 2. Run m steps of the nonsymmetric Lanczos Algorithm i.e., 3. Start with v 1 := r /β, and any w 1 such that (v 1, w 1 ) = 1 4. Generate the Lanczos vectors v 1,..., v m, w 1,..., w m 5. and the tridiagonal matrix T m from Algorithm??. 6. Compute y m = T 1 m (βe 1) and x m := x + V m y m. BCG can be derived from the Lanczos Algorithm similarly to CG in symmetric case. Calais February 7, 25 44

The BCG and QMR Algorithms Let T m = L m U m ( LU factorization of T m ). Define P m = V m U 1 m Then, solution is x m = x + V m T 1 m (βe 1) = x + V m U 1 m L 1 m (βe 1) = x + P m L 1 m (βe 1) x m is updatable from x m 1 similar to the CG algorithm. r j and r j are in the same direction as v j+1 and w j+1 respectively. they form a biorthogonal sequence. The p i s p i s are are A-conjugate. Utilizing this information, a CG-like algorithm can be easily derived from the Lanczos procedure. ALGORITHM : 9 BiConjugate Gradient (BCG) 1. Compute r := b Ax. Choose r such that (r, r ). 2. Set, p := r, p := r 3. For j =, 1,..., until convergence Do:, 4. α j := (r j, rj )/(Ap j, p j ) 5. x j+1 := x j + α j p j 6. r j+1 := r j α j Ap j 7. rj+1 := r j α ja T p j 8. β j := (r j+1, rj+1 )/(r j, rj ) 9. p j+1 := r j+1 + β j p j 1. p j+1 := r j+1 + β jp j 11. EndDo Calais February 7, 25 45 Calais February 7, 25 46 Quasi-Minimal Residual Algorithm The Lanczos algorithm gives the relations AV m = V m+1 T m with T m = (m + 1) m tridiagonal matrix T m = T m δ m+1 e T m Let v 1 βr and x = x + V m y. Residual norm b Ax 2 is ( r AV m y 2 = βv 1 V m+1 T m y 2 = V m+1 βe1 T m y ) 2 Column-vectors of V m+1 are not orthonormal ( GMRES). But: reasonable idea to minimize the function J(y) βe 1 T m y 2 Quasi-Minimal Residual Algorithm (Freund, 199).. ALGORITHM : 1 QMR 1. Compute r = b Ax and γ := r 2, w 1 := v 1 := r /γ 1 2. For m = 1, 2,..., until convergence Do: 3. Compute α m, δ m+1 and v m+1, w m+1 as in Lanczos Algor. [alg.??] 4. Update the QR factorization of T m, i.e., 5. Apply Ω i, i = m 2, m 1 to the m-th column of T m 6. Compute the rotation coefficients c m, s m 7. Apply rotation Ω m, to T m and ḡ m, i.e., compute: 8. γ m+1 := s m γ m ; γ m := c m γ m ; and α m := c m α m + s m δ m+1 9. p m = ( v m m 1 i=m 2 t imp i ) /tmm 1. x m = x m 1 + γ m p m 11. If γ m+1 is small enough Stop 12. EndDo Calais February 7, 25 47 Calais February 7, 25 48

Transpose-Free Variants Conjugate Gradient Squared BCG and QMR require a matrix-by-vector product with A and A T at each step. The products with A T do not contribute directly to x m. They allow to determine the scalars (α j and β j in BCG). QUESTION: is it possible to bypass the use of A T? Motivation: in nonlinear equations, A is often not available explicitly but via the Frechet derivative: J(u k )v = F (u k + ɛv) F (u k ) ɛ. * Clever variant of BCG which avoids using A T [Sonneveld, 1984]. In BCG: r i = ρ i (A)r where ρ i = polynomial of degree i. In CGS: r i = ρ 2 i (A)r Calais February 7, 25 49 Calais February 7, 25 5 Define r j = φ j (A)r, p j = π j (A)r, r j = φ j(a T )r, p j = π j (A T )r. Scalar α j in BCG is given by α j = (φ j(a)r, φ j (A T )r ) (Aπ j (A)r, π j (A T )r ) = (φ2 j (A)r, r ) (Aπ 2 j (A)r, r ) Possible to get a recursion for the φ 2 j (A)r and π 2 j (A)r? Square the equalities φ j+1 (t) = φ j (t) α j tπ j (t), π j+1 (t) = φ j+1 (t) + β j π j (t) φ 2 j+1 (t) = φ2 j (t) 2α jtπ j (t)φ j (t) + α 2 j t2 π 2 j (t), Solution: Let φ j+1 (t)π j (t), be a third member of the recurrence. For π j (t)φ j (t), note: φ j (t)π j (t) = φ j (t) (φ j (t) + β j 1 π j 1 (t)) = φ 2 j (t)+β j 1φ j (t)π j 1 (t). Result: φ 2 j+1 = φ2 j α jt ( 2φ 2 j + 2β j 1φ j π j 1 α j t π 2 j φ j+1 π j = φ 2 j + β j 1φ j π j 1 α j t π 2 j π 2 j+1 = φ2 j+1 + 2β jφ j+1 π j + β 2 j π2 j. Define: r j = φ 2 j (A)r, p j = πj 2(A)r, q j = φ j+1 (A)π j (A)r ) π 2 j+1 (t) = φ2 j+1 (t) + 2β jφ j+1 (t)π j (t) + β 2 j π j(t) 2. Problem: Cross terms Calais February 7, 25 51 Calais February 7, 25 52

Recurrences become: r j+1 = r j α j A (2r j + 2β j 1 q j 1 α j A p j ), q j = r j + β j 1 q j 1 α j A p j, p j+1 = r j+1 + 2β j q j + β 2 j p j. Define auxiliary vector d j = 2r j + 2β j 1 q j 1 α j Ap j one more auxiliary vector, u j = r j + β j 1 q j 1. So d j = u j + q j, q j = u j α j Ap j, p j+1 = u j+1 + β j (q j + β j p j ), vector d j is no longer needed. Sequence of operations to compute the approximate solution, starting with r := b Ax, p := r, q :=, β :=. 1. α j = (r j, r )/(Ap j, r ) 2. d j = 2r j + 2β j 1 q j 1 α j Ap j 3. q j = r j + β j 1 q j 1 α j Ap j 5. r j+1 = r j α j Ad j 6. β j = (r j+1, r )/(r j, r ) 7. p j+1 = r j+1 + β j (2q j + β j p j ). 4. x j+1 = x j + α j d j Calais February 7, 25 53 Calais February 7, 25 54 ALGORITHM : 11 Conjugate Gradient Squared 1. Compute r := b Ax ; r arbitrary. 2. Set p := u := r. 3. For j =, 1, 2..., until convergence Do: 4. α j = (r j, r )/(Ap j, r ) 5. q j = u j α j Ap j 6. x j+1 = x j + α j (u j + q j ) 7. r j+1 = r j α j A(u j + q j ) 8. β j = (r j+1, r )/(r j, r ) 9. u j+1 = r j+1 + β j q j 1. p j+1 = u j+1 + β j (q j + β j p j ) 11. EndDo Note: no matrix-by-vector products with A T but two matrix-byvector products with A, at each step. Vector: Polynomial in BCG : q i r i (t) p i 1 (t) u i p 2 i (t) r i r 2 i (t) where r i (t) = residual polynomial at step i for BCG,.i.e., r i = r i (A)r, and p i (t) = conjugate direction polynomial at step i, i.e., p i = p i (A)r.

BCGSTAB (van der Vorst, 1992) In CGS: residual polynomial of BCG is squared. bad behavior in case of irregular convergence. Bi-Conjugate Gradient Stabilized (BCGSTAB) = a variation of CGS which avoids this difficulty. Derivation similar to CGS. Residuals in BCGSTAB are of the form, r j = ψ j(a)φ j (A)r in which, φ j (t) = BCG residual polynomial, and.... ψ j (t) = a new polynomial defined recursively as ψ j+1 (t) = (1 ω j t)ψ j (t) ω i chosen to smooth convergence [steepest descent step] ALGORITHM : 12 BCGSTAB 1. Compute r := b Ax ; r arbitrary; 2. p := r. 3. For j =, 1,..., until convergence Do: 4. α j := (r j, r )/(Ap j, r ) 5. s j := r j α j Ap j 6. ω j := (As j, s j )/(As j, As j ) 7. x j+1 := x j + α j p j + ω j s j 8. r j+1 := s j ω j As j 9. β j := (r j+1,r ) (r j,r ) α j ω j 1. p j+1 := r j+1 + β j (p j ω j Ap j ) 11. EndDo Calais February 7, 25 57 Calais February 7, 25 58 Convergence Theory for CG Approximation of the form x = x + p m 1 (A)r. with x = initial guess, r = b Ax ; THEORY Optimality property: x m minimizes x x A over x + K m Consequence: Standard result Let x m = m-th CG iterate, x = exact solution and λ min η = λ max λ min Then: x x m A x x A T m (1 + 2η) where T m = Chebyshev polynomial of degree m. Calais February 7, 25 6

THEORY FOR NONHERMITIAN CASE Convergence results for nonsymmetric case Much more difficult! No convincing results on global convergence for many algorithms (bi-cg, FOM, etc..) Can get a general a-priori a-posteriori error bound Methods based on minimum residual better understood. If (A + A T ) is positive definite ((Ax, x) > x ), all minimum residual-type methods (ORTHOMIN, ORTHODIR, GCR, GMRES,...), + their restarted and truncated versions, converge. Convergence results based on comparison with steepest descent [Eisenstat, Elman, Schultz 1982] not sharp. Minimum residual methods: if A = XΛX 1, Λ diagonal, then b Ax m 2 Cond 2 (X) min p Pm 1,p()=1 max λ Λ(A) p(λ) Calais February 7, 25 61 ( P m 1 set of polynomials of degree m 1, Λ(A) spectrum of A) Calais February 7, 25 62 Two useful projectors The approximate problem in terms of P and Q Let P be the orthogonal projector onto K and Q be the (oblique) projector onto K and orthogonally to L. Approximate problem amounts to solving Q(b Ax) =, x K x or in operator form Px K, x Px K Qx K, x Qx L Qx L Px K Q(b APx) = Question: what accuracy can one expect? If x is the exact solution, then we cannot get better accuracy than (I P)x 2, i.e., x x 2 (I P)x 2 Calais February 7, 25 63

THEOREM. Let γ = QA(I P) 2 and assume that b belongs to K. Then the residual norm of the exact solution x for the (approximate) linear operator A m satisfies the inequality, b A m x 2 γ (I P)x 2 In other words if approximate problem is not poorly conditioned and if (I P)x 2 is small then we will obtain a good approximate solution. Methods based on the normal equations It is possible to obtain the solution of Ax = b from the equivalent system: A T Ax = A T b or AA T y = b, x = A T y Methods based on these approaches are usually slower than previous ones. (Condition number of system is squared) Exception: when A is strongly indefinite (extreme case: A is orthogonal, A T A = I convergence in 1 step). Calais February 7, 25 65 Calais February 7, 25 66 CGNR and CGNE Can use CG to solve normal equations. Two well-known options. (1) CGNR: Conjugate Gradient method on A T Ax = A T b (2) CGNE: Let x = A T y and use conjugate gradient method on AA T y = b Different optimality properties Various efficient formulations in both cases ALGORITHM : 13 CGNR 1. Compute r = b Ax, z = A T r, p = z. 2. For i =,..., until convergence Do: 3. w i = Ap i 4. α i = z i 2 / w i 2 2 5. x i+1 = x i + α i p i 6. r i+1 = r i α i w i 7. z i+1 = A T r i+1 8. β i = z i+1 2 2 / z i 2 2, 9. p i+1 = z i+1 + β i p i 1. EndDo Calais February 7, 25 67 Calais February 7, 25 68

CGNR: The approximation x m minimizes the residual norm b Ax 2 over the affine Krylov subspace, x + span{a T r, (A T A)A T r,..., (A T A) m 1 A T r }, where r b Ax. The difference with GMRES is the subspace in which the residual norm is minimized. For GMRES the subspace is x + K m (A, r ). ALGORITHM : 14 CGNE (Craig s Method) 1. Compute r = b Ax, p = A T r. 2. For i =, 1,..., until convergence Do: 3. α i = (r i, r i )/(p i, p i ) 4. x i+1 = x i + α i p i 5. r i+1 = r i α i Ap i 6. β i = (r i+1, r i+1 )/(r i, r i ) 7. p i+1 = A T r i+1 + β i p i 8. EndDo Calais February 7, 25 69 Calais February 7, 25 7 CGNE produces the approximate solution x in the subspace x + A T K m (AA T, r ) = x + K m (A T A, A T r ) which minimizes x x, where x = A 1 b, r = b Ax. Note: Same subspace as CGNR! Block GMRES and Block Krylov Methods Main Motivation: To solve linear systems with several righthand sides Ax (i) = b (i), i = 1,..., p or, in matrix form, AX = B Let Sometimes Block methods are used as a strategy for enhancing convergence even for the case p = 1. R [r (1), r(2),..., r(p) ]. each column is r (i) = b (i) Ax (i). Calais February 7, 25 71 Krylov methods find an approximation to X from the subspace

K m (A, R ) = span{r, AR,..., A m 1 R } For example Block-GMRES (BGMRES) nds X to minimize B AX F for X X + K m (A, R ) Various implementations of BGMRES exist Simplest one is based on Ruhe s variant of the Block Arnoldi procedure. ALGORITHM : 15 Block Arnoldi Ruhe s variant 1. Choose p initial orthonormal vectors {v i } i=1,...,p. 2. For j = p, p + 1,..., m Do: 3. Set k := j p + 1; 4. Compute w := Av k ; 5. For i = 1, 2,..., j Do: 6. h i,k := (w, v i ) 7. w := w h i,k v i 8. EndDo 9. Compute h j+1,k := w 2 and v j+1 := w/h j+1,k. 1. EndDo p = 1 coincides with standard Arnoldi process. Interesting feature: dimension of the subspace need not be a multiple of the block-size p. At the end of the algorithm, we have the relation AV m = V m+p H m. The matrix H m is now of size (m + p) m. Each approximate solution has the form x (i) = x (i) + V my (i), where y (i) must minimize the norm b (i) Ax (i) 2. Plane rotations can be used for this purpose as in the standard GMRES [p rotations are needed for each step.] Calais February 7, 25 75