Fast iterative solvers

Similar documents
Fast iterative solvers

Iterative Methods for Linear Systems of Equations

Lecture 8 Fast Iterative Methods for linear systems of equations

IDR(s) Master s thesis Goushani Kisoensingh. Supervisor: Gerard L.G. Sleijpen Department of Mathematics Universiteit Utrecht

DELFT UNIVERSITY OF TECHNOLOGY

Solving Ax = b, an overview. Program

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

Krylov Space Solvers

IDR(s) A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

A DISSERTATION. Extensions of the Conjugate Residual Method. by Tomohiro Sogabe. Presented to

Iterative methods for Linear System

M.A. Botchev. September 5, 2014

The Lanczos and conjugate gradient algorithms

On the influence of eigenvalues on Bi-CG residual norms

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

Iterative Methods for Sparse Linear Systems

AMS526: Numerical Analysis I (Numerical Linear Algebra)

6.4 Krylov Subspaces and Conjugate Gradients

Preface to the Second Edition. Preface to the First Edition

Krylov subspace projection methods

FEM and sparse linear system solving

DELFT UNIVERSITY OF TECHNOLOGY

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Variants of BiCGSafe method using shadow three-term recurrence

Recent computational developments in Krylov Subspace Methods for linear systems. Valeria Simoncini and Daniel B. Szyld

Lecture 9 Least Square Problems

1. Introduction. Krylov subspace methods are used extensively for the iterative solution of linear systems of equations Ax = b.

Recycling Bi-Lanczos Algorithms: BiCG, CGS, and BiCGSTAB

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

DELFT UNIVERSITY OF TECHNOLOGY

KRYLOV SUBSPACE ITERATION

1 Conjugate gradients

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

Lecture 5 Basic Iterative Methods II

Preconditioned Conjugate Gradient-Like Methods for. Nonsymmetric Linear Systems 1. Ulrike Meier Yang 2. July 19, 1994

Key words. linear equations, polynomial preconditioning, nonsymmetric Lanczos, BiCGStab, IDR

Numerical Methods in Matrix Computations

Contents. Preface... xi. Introduction...

RESIDUAL SMOOTHING AND PEAK/PLATEAU BEHAVIOR IN KRYLOV SUBSPACE METHODS

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

Valeria Simoncini and Daniel B. Szyld. Report October 2009

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

IDR(s) as a projection method

Recycling Techniques for Induced Dimension Reduction Methods Used for Solving Large Sparse Non-Symmetric Systems of Linear Equations

DELFT UNIVERSITY OF TECHNOLOGY

Alternative correction equations in the Jacobi-Davidson method

Computational Linear Algebra

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17

-.- Bi-CG... GMRES(25) --- Bi-CGSTAB BiCGstab(2)

On prescribing Ritz values and GMRES residual norms generated by Arnoldi processes

Course Notes: Week 1

Recent advances in approximation using Krylov subspaces. V. Simoncini. Dipartimento di Matematica, Università di Bologna.

Some minimization problems

arxiv: v2 [math.na] 1 Sep 2016

Algebraic Multigrid as Solvers and as Preconditioner

Arnoldi Methods in SLEPc

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU

Institute for Advanced Computer Studies. Department of Computer Science. Iterative methods for solving Ax = b. GMRES/FOM versus QMR/BiCG

Krylov Subspace Methods for Large/Sparse Eigenvalue Problems

Performance Evaluation of GPBiCGSafe Method without Reverse-Ordered Recurrence for Realistic Problems

Alternative correction equations in the Jacobi-Davidson method. Mathematical Institute. Menno Genseberger and Gerard L. G.

DELFT UNIVERSITY OF TECHNOLOGY

IDR EXPLAINED. Dedicated to Richard S. Varga on the occasion of his 80th birthday.

Comparison of Fixed Point Methods and Krylov Subspace Methods Solving Convection-Diffusion Equations

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

Math 504 (Fall 2011) 1. (*) Consider the matrices

Solving Sparse Linear Systems: Iterative methods

Solving Sparse Linear Systems: Iterative methods

ITERATIVE PROJECTION METHODS FOR SPARSE LINEAR SYSTEMS AND EIGENPROBLEMS CHAPTER 11 : JACOBI DAVIDSON METHOD

Computational Linear Algebra

arxiv: v4 [math.na] 1 Sep 2018

Nested Krylov methods for shifted linear systems

Simple iteration procedure

ETNA Kent State University

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Finite-choice algorithm optimization in Conjugate Gradients

Krylov Subspace Methods that Are Based on the Minimization of the Residual

Since the early 1800s, researchers have

Davidson Method CHAPTER 3 : JACOBI DAVIDSON METHOD

Lab 1: Iterative Methods for Solving Linear Systems

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

College of William & Mary Department of Computer Science

Algorithms that use the Arnoldi Basis

Block Krylov Space Solvers: a Survey

CG Type Algorithm for Indefinite Linear Systems 1. Conjugate Gradient Type Methods for Solving Symmetric, Indenite Linear Systems.

S-Step and Communication-Avoiding Iterative Methods

Residual iterative schemes for largescale linear systems

The Conjugate Gradient Method

A HARMONIC RESTARTED ARNOLDI ALGORITHM FOR CALCULATING EIGENVALUES AND DETERMINING MULTIPLICITY

Transcription:

Utrecht, 15 november 2017 Fast iterative solvers Gerard Sleijpen Department of Mathematics http://www.staff.science.uu.nl/ sleij101/

Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a general square matrix A. Arnoldi decomposition: AV k = V k+1 H k with V k orthonormal, H k Hessenberg. H k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. FOM: r k V k H k y k = e 1 x k = V k (Uk 1 k e 1)) GMRES: y k = argmin y b AV k y 2 = argmin y e 1 H k y 2 x k = V k (R 1 k (Q k e 1 )) The columns of V k span K k (A, r 0 ) {p(a)r 0 p P k 1 }. K k (A, r 0 ) = span(r 0, r 1,..., r k 1 ).

Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) Note. Short recurrences (eff. comp.) because of T k tri-diagonal, and U k U k = V k (CG) and W k R k = V k (MINRES)

Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) Details. For MINRES, see Exercise 7.8

Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 Note. Less stable because x k = (V k R 1 k )(Q k e 1 ) we rely on math. for orth. of V k (CG & MINRES) W k R k = V k (MINRES) is a three term vector recurrence for the w k s.

Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) Note. If A is positive definite, then CG minimizes as well: y k = argmin y x V k y A = argmin y b AV k y A 1

Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) SYMMLQ. Take x k = AV k y k with y k argmin y x AV k y 2 e 1 T k T k y k = 0 x k = V k+1 T k (T k T k ) 1 e 1 = (V k+1 Q k )((R k ) 1 e 1 )

Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) Details. For SYMMLQ, see Exercise 7.9.

FOM residual polynomials and Ritz values Property. r CG k = r FOM k = p k (A)r 0 V k, for some residual polynomial p k of degree k, i.e., p k (0) = 1. p FOM k = p k is the kth (CG or) FOM residual polynomial. Theorem. p FOM (ϑ) = 0 H k y = ϑ y for some y 0: the zeros of p FOM k In particular, are precisely the kth order Ritz values. ) (1 λϑj p FOM k (λ) = k j=1 (λ C). Moreover, for a polynomial p of degree at most k with p(0) = 1, we have that that Proof. Exercise 6.6. p = p FOM k p(h k ) = 0.

FOM residual polynomials and Ritz values Property. r CG k = r FOM k = p k (A)r 0 V k, for some residual polynomial p k of degree k, i.e., p k (0) = 1. p FOM k = p k is the kth (CG or) FOM residual polynomial. Theorem. p FOM (ϑ) = 0 H k y = ϑ y for some y 0: the zeros of p FOM k In particular, are precisely the kth order Ritz values. ) (1 λϑj p FOM k (λ) = k j=1 (λ C). Moreover, for a polynomial p of degree at most k with p(0) = 1, we have that that p = p FOM k p(h k ) = 0. Theorem. Similarly relate zeros of p GMRES k Ritz values of H k. to harmonic

Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues

Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues

Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues

Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues

Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues

with A n n non-singular. Ax = b Today s topic. Iterative methods for general systems using short recurrences

Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

A = A > 0, Conjugate Gradient [Hestenes Stiefel 52] x = 0, r = b, u = 0, ρ = 1 While r > tol do σ = ρ, ρ = r r, β = ρ/σ u r β u, c = Au σ = u c, α = ρ/σ r r α c x x + α u end while

Construction CG. There are four alternative derivations of CG. GCR (use A = A) CR use A 1 inner product + efficient implementation. Lanczos + T = LU + efficient implementation. Orthogonalize residuals. Nonlinear CG to solve x = argmin x 1 2 b A x 2 A 1... [Exercise 7.3]

Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k

Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, Ku k K k+1 (AK 1, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (AK 1, r 0 ) If r k, Au k K 1 r k 1, then r k, Au k K 1 K k (AK 1, r 0 )

Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, Ku k K k+1 (AK 1, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (AK 1, r 0 ) If r k, Au k K 1 r k 1, then r k, Au k K 1 K k (AK 1, r 0 ) Proof. r k = r k 1 α k 1 A u k 1 K 1 r k 1 r k = r k 1 α k 1 A u k 1 K 1 K k 1 (AK 1, r 0 ) by construction α k 1 by induction

Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, Ku k K k+1 (AK 1, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (AK 1, r 0 ) If r k, Au k K 1 r k 1, then r k, Au k K 1 K k (AK 1, r 0 ) Proof. A u k = AK 1 r k β k A u k 1 K 1 r k 1 A u k = AK 1 r k β k A u k 1 K 1 K k 1 (AK 1, r 0 ) by construction β k by induction: AK 1 r k K 1 K k 1 (AK 1, r 0 ) r k K 1 AK 1 K k 1 (AK 1, r 0 ) r k K 1 K k (AK 1, r 0 )

Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, Ku k K k+1 (AK 1, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (AK 1, r 0 ) If r k, Au k K 1 r k 1, then r k, Au k K 1 K k (AK 1, r 0 ) Proof. A u k = AK 1 r k β k A u k 1 K 1 r k 1 A u k = AK 1 r k β k A u k 1 K 1 K k 1 (AK 1, r 0 ) by construction β k by induction: AK 1 r k K 1 K k 1 (AK 1, r 0 ) r k K 1 AK 1 K k 1 (AK 1, r 0 ) r k K 1 K k (AK 1, r 0 )

A = A & K = K: Preconditioned CG x = 0, r = b, u = 0, ρ = 1 While r > tol do Solve Kc = r for c σ = ρ, ρ = c r, β = ρ/σ u c β u, c = Au σ u c, α = ρ/σ r r α c x x + α u end while

Properties CG Pros Low costs per step: 1 MV, 2 DOT, 3 AXPY to increase dimension Krylov subspace by one. Low storage: 5 large vectors (incl. b). Minimal res. method if A, K pos. def.: r k A 1 is min. Orthogonal residual method if A = A, K = K: r k K 1 K k (AK 1 ; r 0 ). No additional knowledge on properties of A is needed. Robust: CG always converges if A, K pos. def.. Cons May break down if A = A 0. Does not work if A A. CG is sensitive to evaluation errors if A = A 0. Often loss of a) super-linear conv., and b) accuracy. For two reasons: 1) Loss of orthogonality in the Lanczos recursion 2) As in FOM, bumps and peaks in CG conv. hist.

Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

For general square non-singular A Apply CG to normal equations (A Ax = A b) CGNE Apply CG to AA y = b (then x = A y) Graig s method Disadvantage. Search in K k (A A,...): If A = A then convergence is determined by A 2 : condition number squared,.... Expansion K k requires 2 MVs (i.e., many costly steps). For a discussion on Graig s method, see Exercise 8.1. For a Graig versus GCR, see Exercise 8.6.

For general square non-singular A Apply CG to normal equations (A Ax = A b) CGNE Apply CG to AA y = b (then x = A y) Graig s method Disadvantage. Search in K k (A A,...): If A = A then convergence is determined by A 2 : condition number squared,.... Expansion K k requires 2 MVs (i.e., many costly steps). [Faber Manteufel 90] Theorem. For general square non-singular A, there is no Krylov solver that finds the best solution in de Krylov subspace K k (A, r 0 ) using short recurrences.

For general square non-singular A Apply CG to normal equations (A Ax = A b) CGNE Apply CG to AA y = b (then x = A y) Graig s method Disadvantage. Search in K k (A A,...): If A = A then convergence is determined by A 2 : condition number squared,.... Expansion K k requires 2 MVs (i.e., many costly steps). [Faber Manteufel 90] Theorem. For general square non-singular A, there is no Krylov solver that finds the best solution in de Krylov subspace K k (A, r 0 ) using short recurrences. Alternative. Construct residuals in a sequence of shrinking spaces (orthogonal to a sequence of growing spaces): adapt the construction of CG.

Conjugate Gradients, A = A. K = I u k = r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, u k K k+1 (A, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (A, r 0 ) If r k, Au k r k 1, then r k, Au k K k (A, r 0 ) Proof. A u k = A r k β k A u k 1 r k 1 A u k = A r k β k A u k 1 K k 1 (A, r 0 ) by construction β k 1 by induction: Ar k K k 1 (A, r 0 ) r k AK k 1 (A, r 0 ) r k K k (A, r 0 )

Bi-Conjugate Gradients u k = r k β k u k 1 r k+1 = r k α k A u k Theorem. We have r k, u k K k+1 (A, r 0 ). Suppose r 0,..., r k 1 is a Krylov basis of K k (A, r 0 ). If r k, Au k r k 1, then r k, Au k K k (A, r 0 ). Proof. r k = r k 1 α k 1 A u k 1 r k 1 r k = r k 1 α k 1 A u k 1 K k 1 (A, r 0 ) by construction α k 1 by induction A u k = A r k β k A u k 1 r k 1 A u k = A r k β k A u k 1 K k 1 (A, r 0 ) by construction β k 1 by induction: Ar k K k 1 (A, r 0 ) r k K k (A, r 0 ) A K k 1 (A, r 0 )

Bi-Conjugate Gradients u k = r k β k u k 1 r k+1 = r k α k A u k r k, u k K k+1 (A, r 0 ), r k, Au k r k 1 With ρ k (r k, r k ) & σ k (Au k, r k ) and, since r k + ϑ k A r k 1 K k (A, r 0 ) for some ϑ k, is the complex conjugate we have that α k = (r k, r k ) (Au k, r k ) = ρ k σ k and β k = (Ar k, r k 1 ) (Au k 1, r k 1 ) = (r k, A r k 1 ) σ k 1 = ρ k ϑ k σ k 1

Bi-Conjugate Gradients u k = r k β k u k 1 r k+1 = r k α k A u k r k, u k K k+1 (A, r 0 ), r k, Au k r k 1 With ρ k (r k, q k (A ) r 0 ) & σ k (Au k, q k (A ) r 0 ) and, since q k (ζ) + ϑ k ζ q k 1 (ζ) P k 1 for some ϑ k, we have that α k = ρ k σ k & β k = ρ k ϑ k σ k 1

Bi-Conjugate Gradients Classical Bi-CG [Fletcher 76] generates the shadow residuals r k = q k (A ) r 0 with the same polynomal as r k (q k = p k ) r k = p k (A)r 0, r k = p k (A ) r 0 : i.e., compute r k+1 as r k+1 = r k ᾱ k A ũ k, with ũ k = r k β k ũ k 1. In particular, ϑ k = α k 1. However, other choices for q k are possible as well. Example. q k (ζ) = (1 ω k 1 ζ) q k 1 (ζ) (ζ C). Then, ϑ k = ω k 1 and r k = r k 1 ω k 1 A r k 1, with, for instance, ω k 1 to minimize r k 2. The next transparancy displays classical Bi-CG.

[Fletcher 76] Bi-CG x = 0, r = b. Choose r u = 0, ρ = 1 ũ = 0 While r > tol do σ = ρ, ρ = (r, r), β = ρ/σ u r β u, c = Au, ũ r β ũ, c = A ũ σ = (c, r), α = ρ/σ r r α c, r r ᾱ c x x + α u end while

[Fletcher 76] Bi-CG x = 0, r = b. Choose r u = 0, ρ = 1 c = 0 While r > tol do σ = ρ, ρ = (r, r), β = ρ/σ u r β u, c = Au, σ = (c, r), α = ρ/σ r r α c, x x + α u end while c A r β c r r ᾱ c

Selecting the initial shadow residual r 0. Often recommended: r 0 = r 0. Practical experience: select r 0 randomly (unless A = A). Exercise. Bi-CG and CG coincide if A is Hermitian and r 0 = r 0. Exercise. Derive a version of Bi-CG that includes a preconditioner K. Show that Bi-CG and CG coincide if A and K are Hermitian and r 0 = K 1 r 0. Exercise 8.9 gives an alternative derivation of Bi-CG.

Properties Bi-CG Pros Usually selects good approximations from the search subspaces (Krylov subspaces). Low costs per step: 2 DOT, 5 AXPY. Low storage: 7 large vectors. No knowledge on properties of A is needed. Cons Non-optimal Krylov subspace method. Not robust: Bi-CG may break down. Bi-CG is sensitive to evaluation errors (often loss of super-linear convergence). Convergence depends on shadow residual r 0. 2 MV needed to expand search subspace by 1 vector. 1 MV is by A.

Properties Bi-CG Pros Usually selects good approximations from the search subspaces (Krylov subspaces). Low costs per step: 2 DOT, 5 AXPY. Low storage: 8 large vectors. No knowledge on properties of A is needed. Cons Non-optimal Krylov subspace method. Not robust: Bi-CG may break down. Bi-CG is sensitive to evaluation errors (often loss of super-linear convergence). Convergence depends on shadow residual r 0. 2 MV needed to expand search subspace by 1 vector. 1 MV is by A.

Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

Bi-Lanczos Find coefficients α k, β k, α k and β k such that (bi-orthogonalize) γ k v k+1 = Av k α k v k β k v k 1... w k, w k 1,... γ k w k+1 = A w k α k w k β k w k 1... v k, v k 1,... Select appropriate scaling coefficients γ k and γ k. Then AV k = V k+1 H k with H k Hessenberg A W k = W k+1 H k with H k Hessenberg and W k+1 V k+1 = D k+1 diagonal Exercise. T k W k AV k = D k H k = H k Dk is tridiagonal. Exploit H k = D k Hk D k Bi-Lanczos. and tridiagonal structure:

Bi-Lanczos Find coefficients α k, β k, α k and β k such that (bi-orthogonalize) γ k v k+1 = Av k α k v k β k v k 1... w k, w k 1,... γ k w k+1 = A w k α k w k β k w k 1... v k, v k 1,... Select appropriate scaling coefficients γ k and γ k. Then AV k = V k+1 H k with H k Hessenberg A W k = W k+1 H k with H k Hessenberg and W k+1 V k+1 = D k+1 diagonal Exercise. T k W k AV k = D k H k = H k Dk is tridiagonal. Exploit H k = D k Hk D k Bi-Lanczos. and tridiagonal structure: See Exercise 8.7 for details.

[Lanczos 50] Lanczos ρ = r 0, v 1 = r 0 /ρ β 0 = 0, v 0 = 0 for k = 1, 2,... do ṽ = A v k β k 1 v k 1 α k = v k ṽ, β k = ṽ, ṽ ṽ α k v k v k+1 = ṽ/β k end while

[Lanczos 50] Bi-Lanczos Select a r 0, and a r 0 v 1 =r 0 / r 0, v 0 =0, w 1 = r 0 / r 0, w 0 =0 γ 0 = 0, δ 0 = 1, γ 0 = 0, δ0 = 1 For k = 1, 2,... do δ k = w k v k, ṽ = A v k, β k = γ k 1 δ k /δ k 1, ṽ ṽ β k v k 1, α k = w kṽ/δ k, ṽ ṽ α k v k, w = A w k βk = γ k 1 δ k /δ k 1 w w βk w k 1 α k = α k w w α k w k Select a γ k 0 and a γ k 0 v k+1 = ṽ/γ k, w k+1 = w/ γ k, V k = [ V k 1, v k ], W k = [ W k 1, w k ] end while

Arnoldi: AV k = V k+1 H k. If A = A, then T k H k tridiagonal Lanczos Lanczos + T = LU + efficient implementation CG Bi-Lanczos + T = LU + efficient implementation Bi-CG

[Fletcher 76] Bi-CG x = 0, r = b. Choose r u = 0, ρ = 1 c = 0 While r > tol do σ = ρ, ρ = (r, r), β = ρ/σ u r β u, c = Au, σ = (c, r), α = ρ/σ r r α c, x x + α u end while c A r β c r r ᾱ c

Bi-CG may break down 0) Lucky breakdown if r k = 0. 1) Pivot breakdown or LU-breakdown, i.e., LU-decomposition may not exist. Corresponds to σ = 0 in Bi-CG Remedy. Composite step Bi-CG (skip once forming T k = L k U k ) Form T = QR as in MINRES (from the beginning): simple Quasi Minimal Residuals 2) Bi-Lanczos may break down, i.e., a diagonal element of D k may be zero. Corresponds to ρ = 0 in Bi-CG Remedy. Look ahead General remedy. Restart Look ahead in QMR

Note. CG may suffer from pivot breakdown when applied to a Hermitian, non definite matrix (A = A with positive as well as negative eigenvalues): MINRES and SYMMLQ cure this breakdown. Note. Exact breakdowns are rare. However, near breakdowns lead to irregular convergence and instabilities. This leads to loss of speed of convergence loss of accuracy

Properties Bi-CG Advantages Usually selects good approximations from the search subspaces (Krylov subspaces). 2 DOT, 5 AXPY per step. Storage: 8 large vectors. No knowledge on properties of A is needed. Drawbacks Non-optimal Krylov subspace method. Not robust: Bi-CG may break down. Bi-CG is sensitive to evaluation errors (often loss of super-linear convergence). Convergence depends on shadow residual r 0. 2 MV needed to expand search subspace. 1 MV is by A.

Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

Bi-Conjugate Gradients, K=I u k = r k β k u k 1 r k+1 = r k α k A u k r k, u k K k+1 (A, r 0 ), r k, Au k r k 1 With ρ k (r k, q k (A ) r 0 ) & σ k (Au k, q k (A ) r 0 ) and, since q k (ζ) + ϑ k ζ q k 1 (ζ) P k 1 for some ϑ k, we have that α k = ρ k σ k & β k = ρ k ϑ k σ k 1

Transpose-free Bi-CG [Sonneveld 89] ρ k = (r k, q k (A ) r 0 ) = (q k (A)r k, r 0 ), σ k = (Au k, q k (A ) r 0 ) = (Aq k (A)u k, r 0 ) Q k q k (A) (Bi-CG) { ρk, Q k u k = Q k r k β k Q k u k 1, σ k, Q k r k+1 = Q k r k α k AQ k u k, (Pol) Compute q k+1 of degree k + 1 s.t. q k+1 (0) = 1. Compute Q k+1 u k, Q k+1 r k+1 (from Q k u k, Q k r k+1,...

Transpose-free Bi-CG [Sonneveld 89] ρ k = (r k, q k (A ) r 0 ) = (q k (A)r k, r 0 ), σ k = (Au k, q k (A ) r 0 ) = (Aq k (A)u k, r 0 ) Q k q k (A) (Bi-CG) { ρk, Q k u k = Q k r k β k Q k u k 1, σ k, Q k r k+1 = Q k r k α k AQ k u k, (Pol) Compute q k+1 of degree k + 1 s.t. q k+1 (0) = 1. Compute Q k+1 u k, Q k+1 r k+1 Example. q k+1 (ζ) = (1 ω k ζ)q k (ζ)

Transpose-free Bi-CG [Sonneveld 89] ρ k = (r k, q k (A ) r 0 ) = (q k (A)r k, r 0 ), σ k = (Au k, q k (A ) r 0 ) = (Aq k (A)u k, r 0 ) Q k q k (A) (Bi-CG) { ρk, Q k u k = Q k r k β k Q k u k 1, σ k, Q k r k+1 = Q k r k α k AQ k u k, (Pol) Compute q k+1 of degree k + 1 s.t. q k+1 (0) = 1. Compute Q k+1 u k, Q k+1 r k+1 Example. q k+1 (ζ) = (1 ω k ζ)q k (ζ) (ζ C). { ωk, Q k+1 u k = Q k u k ω k AQ k u k, Q k+1 r k+1 = Q k r k+1 ω k AQ k r k+1,

Transpose-free Bi-CG; Practice Work with u k Q ku BiCG k and r k Q kr BiCG k+1 Write u k 1 and r k, instead of Q k u BiCG k 1 ρ k = (r k, r 0 ), σ k = (Au k, r 0 ) and Q kr BiCG k, resp. (Bi-CG) { ρk = (r k, r 0 ), u k = r k β k u k 1, σ k = (Au k, r 0 ), r k = r k α k Au k, x k = x k + α k u k (Pol) Compute updating coefficients for q k+1. Compute u k, r k+1, x k+1 Example. { ωk, u k+1 = u k ω k Au k, r k+1 = r k ω k Ar k, x k+1 = x k + ω kr k

Example. q k+1 (ζ) = (1 ω k ζ)q k (ζ) (ζ C). How to choose ω k? Bi-CGSTABilized. With s k Ar k, ω k argmin ω r k ω Ar k 2 = s k r k s k s k as in Local Minimal Residual method, or, equivalently, as in GCR(1).

BiCGSTAB [van der Vorst 92] x = 0, r = b. u = 0, ω = σ = 1. Choose r While r > tol do σ ωσ, ρ = (r, r), β = ρ/σ u r β u, c = Au σ = (c, r), α = ρ/σ r r α c, x x + α u s = Ar, ω = (r, s)/(s, s) u u ω c x x + ω r r r ω s end while

Hybrid Bi-CG or product type Bi-CG r k q k (A)r Bi-CG k = q k (A) p BiCG k (A) r 0 p BiCG k is the kth Bi-CG residual polynomial How to select q k?? q k for efficient steps & fast convergence. Fast convergence by reducing the residual stabilizing the Bi-CG part other when used as linear solver for the Jacobian system in a Newton scheme for non-linear equations, by reducing the number of Newton steps

Hybrid Bi-CG Examples. CGS Bi-CG Bi-CG Sonneveld [1989] Bi-CGSTAB GCR(1) Bi-CG van der Vorst [1992] GPBi-CG 2-truncated GCR Bi-CG Zhang [1997] BiCGstab(l) GCR(l) Bi-CG Sl. Fokkema [1993] For more details on hybrid Bi-CG, see Exercise 8.11 and Exercise 8.12. For a derivation of GPBi-CG, see Exercise 8.13.

Properties hybrid Bi-CG Pros Converges often twice as fast as Bi-CG w.r.t. # MVs: each MV expands the search subspace Bi-CG: x k x 0 K k (A; r 0 ) à 2k MV. Hybrid Bi-CG: x k x 0 K 2k (A; r 0 ) à 2k MV. Work/MV and storage similar to Bi-CG. Transpose free. Explicit computation of Bi-CG scalars. Cons Non-optimal Krylov subspace method. Peaks in the convergence history. Large intermediate residuals. Breakdown possibilities.

Conjugate Gradients Squared [Sonneveld 89] r k = p BiCG k (A) p BiCG k (A) r 0 CGS exploits recurrence relations for the Bi-CG polynomials to design a very efficient algorithm. Properties + Hybrid Bi-CG. + A very efficient algorithm: 1 DOT/MV, 3.25 AXPY/MV; storage: 7 large vectors. Often high peaks in its convergence history Often large intermediate residuals + Seems to do well as linear solver in a Newton scheme

[Sonneveld 89] Conjugate Gradients Squared x = 0, r = b. u = w = 0, ρ = 1. Choose r. While r > tol do σ = ρ, ρ = (r, r), β = ρ/σ w u β w v = r β u w v β w, c = Aw σ = (c, r), α = ρ/σ u = v α c r r α A(v + u) x x + α (v + u) end while

Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

Properties Bi-CGSTAB Pros Hybrid Bi-CG. Converges faster (& smoother) than CGS. More accurate than CGS. 2 DOT/MV, 3 AXPY/MV. Storage: 6 large vectors. Cons Danger of (A) Lanczos breakdown (ρ k = 0), (B) pivot breakdown (σ k = 0), (C) breakdown minimization (ω k = 0).

1.0e+00 1.0e-01 Bi-CG BiCGSTAB GCR(100) 100-truncated GCR 1.0e-02 Log10 of residual norm 1.0e-03 1.0e-04 1.0e-05 1.0e-06 1.0e-07 1.0e-08 1.0e-09 0 50 100 150 200 250 300 Mv (a u x ) x (a u y ) y = 1 on [0, 1] [0, 1]. a = 1000 for 0.1 x, y 0.9 and a = 1 elsewhere. Dirichlet BC on y = 0, Neumann BC on other parts of Boundary. 82 82 volumes. ILU Decomp.

1 u = 0 u = 1 F = 100 u = 1 A = 10e4 A = 10e-5 A = 10e2 0 u = 1 1 (a u x ) x (a u y ) y + b u x = f on [0, 1] [0, 1]. definition of a = A and of f = F ; F = 0 except...

4 -.- Bi-CG, -- Bi-CGSTAB, - BiCGstab(2) 2 log10 of residual norm 0-2 -4-6 -8 0 50 100 150 200 250 300 350 400 450 500 number of matrix multiplications (a u x ) x (a u y ) y + bu x = f on [0, 1] [0, 1]. b(x, y) = 2 exp(2(x 2 + y 2 )), a changes strongly Dirichlet BC. 129 129 volumes. ILU Decomp.

4 -.- Bi-CG, -- Bi-CGSTAB, - BiCGstab(2) 2 log10 of residual norm 0-2 -4-6 -8 0 100 200 300 400 500 600 700 800 900 1000 number of matrix multiplications (a u x ) x (a u y ) y + bu x = f on [0, 1] [0, 1]. b(x, y) = 2 exp(2(x 2 + y 2 )), a changes strongly Dirichlet BC. 201 201 volumes. ILU Decomp.

Breakdown of the minimization Exact arithmetic, ω k = 0: No reduction of residual by Q k+1 r k+1 = (I ω k A ) Q k r BiCG k+1. ( ) q k+1 is of degree k: Bi-CG scalars can not be computed; breakdown of incorporated Bi-CG. Finite precision arithmetic, ω k 0: Poor reduction of residual by ( ) Bi-CG scalars are seriously affected by evaluation errors: drop of speed of convergence. ω k 0 to be expected if A is real and A has eigenvalues with rel. large imaginary part: ω k is real!

Example. q k+1 (ζ) = (1 ω k ζ)q k (ζ) (ζ C). How to choose ω k? Bi-CGSTABilized. With s k Ar k, ω k argmin ω r k ω Ar k 2 = s k r k s k s k as in Local Minimal Residual method, or, equivalently, as in GCR(1). BiCGstab(l). Cycle l times through the Bi-CG part to compute A j u, A j r for j = 0,..., l, where now u Q k u BiCG k+l 1 and r Q k r BiCG k+l for k = ml. γ m argmin γ r [Ar,..., A l r ] γ 2

r k+l = r [Ar,..., A l r ] γ m q k+l (ζ) = (1 [ζ,..., ζ l ] γ m )q k (ζ) (ζ C)

BiCGstab(l) for l 2 [Sl Fokkema 93, Sl vdv Fokkema 94] { qk+1 (A) = A q k (A) k ml q ml+l (A) = ϕ m (A) q ml (A) where ϕ m of exact degree l, ϕ m (0) = 1 and k = ml ϕ m minimizes ϕ m (A) q ml (A) r BiCG }{{ ml+l } 2. r ϕ m is a GCR residual polynomial of degree l. Note that real polynomials of degree 2 can have complex zeros.

BiCGstab(l) for l 2 [Sl Fokkema 93, Sl vdv Fokkema 94] { qk+1 (A) = A q k (A) k ml q ml+l (A) = ϕ m (A) q ml (A) where ϕ m of exact degree l, ϕ m (0) = 1 and k = ml ϕ m minimizes ϕ m (A) q ml (A) r BiCG }{{ ml+l } 2. r Minimization in practice: p m (ζ) = 1 l j=1 γ (m) j ζ j (γ (m) j ) argmin (γj ) r l j=1 γ j A j r 2, Compute Ar, A 2 r,..., A l r explicitly. With R [ Ar,..., A l r ], γ m (γ (m) 1,..., γ (m) l ) T we have [Normal Equations, use Choleski] (R R) γ m = R r

BiCGstab(l) x = 0, r = [ b ]. Choose r. u = [ 0 ], γ l = σ = 1. While r > tol σ γ l σ do For j = 1 to l do ρ = (r j, r), β = ρ/σ u r β u, u [ u, Au j ] σ = (u j+1, r), α = ρ/σ r r α u 2:j+1, r [ r, Ar j ] x x + α u 1 end for R r 2:l+1. Solve (R R) γ = R r 1 for γ u [ u 1 (γ 1 u 2 +... + γ l u l+1 ) ] r [ r 1 (γ 1 r 2 +... + γ l r l+1 ) ] x x + (γ 1 r 1 +... + γ l r l ) end while

epsilon = 10^(-16); ell = 4; x = zeros(b); rt = rand(b); sigma = 1; omega = 1; u = zeros(b); y = MV(x); r = b-y; norm = r *r; nepsilon = norm*epsilon^2; L = 2:ell+1; while norm > nepsilon sigma = -omega*sigma; y = r; for j = 1:ell rho = rt *y; beta = rho/sigma; u = r-beta*u; y = MV(u(:,j)); u(:,j+1) = y; sigma = rt *y; alpha = rho/sigma; r = r-alpha*u(:,2:j+1); x = x+alpha*u(:,1); y = MV(r(:,j)); r(:,j+1) = y; end G = r *r; gamma = G(L,L)\G(L,1); omega = gamma(ell); u = u*[1;-gamma]; r = r*[1;-gamma]; x = x+r*[gamma;0]; norm = r *r; end

2 -.- Bi-CG, -- Bi-CGSTAB, - BiCGstab(2) 0 log10 of residual norm -2-4 -6-8 -10 0 100 200 300 400 500 600 number of matrix multiplications u xx + u yy + u zz + 1000u x = f. f s.t. u(x, y, z) = exp(xyz) sin(πx) sin(πy) sin(πz). (52 52 52) volumes. No preconditioning.

-. BiCGStab2, : Bi-CGSTAB, -- BiCGstab(2), - BiCGstab(4) 4 2 log10 of residual norm 0-2 -4-6 -8 0 100 200 300 400 500 600 number of matrix multiplications (a u x ) x (a u y ) y = 1 on [0, 1] [0, 1]. a = 1000 for 0.1 x, y 0.9 and a = 1 elsewhere. Dirichlet BC on y = 0, Neumann BC on other parts of Boundary. 200 200 volumes. ILU Decomp.

-. BiCGStab2, : Bi-CGSTAB, -- BiCGstab(2), - BiCGstab(4) 0-1 log10 of residual norm -2-3 -4-5 -6-7 -8 0 200 400 600 800 1000 number of matrix multiplications ϵ(u xx + u yy ) + a(x, y)u x + b(x, y)u y = 0 on [0, 1] [0, 1], Dirichlet BC ϵ = 10 1, a(x, y) = 4x(x 1)(1 2y), b(x, y) = 4y(1 y)(1 2x), u(x, y) = sin(πx) + sin(13πx) + sin(πy) + sin(13πy) (201 201) volumes, no preconditioning.

2 0-2 -4-6 -8-10 -12-14 -16 0 50 100 150 200 250 300 u xx + u yy + u zz + 1000 u x = f. f is defined by the solution u(x, y, z) = exp(xyz) sin(πx) sin(πy) sin(πz). (10 10 10) volumes. No preconditioning.

ρ k = (r k, r 0 ), ρ k = ρ k(1 + ϵ) Accurate Bi-CG coefficients ϵ n ξ r k 2 r 0 2 (r k, r 0 ) = n ξ ρ k where ρ k (r k, r 0 ) r k 2 r 0 2 2 0-2 -4-6 -8-10 -12-14 -16-18 0 50 100 150 200 250 300

Why using pol. factors of degree 2? Hybrid Bi-CG, that is faster than Bi-CGSTAB 1 sweep BiCGstab(l) versus l steps Bi-CGSTAB: Reduction with MR-polynomial of degree l is better than l MR-pol. of degr. 1. MR-polynomial of degree l contributes only once to an increase of ρ k Why not? Efficiency: 1.75 + 0.25 l DOT/MV, 2.5 + 0.5 l AXPY/MV Storage: 2l + 5 large vector. Loss of accuracy: rk b Ax k (... + c ξ max γi A A i 1 r ) break-downs are possible

Properties Bi-CG Advantages Usually selects good approximations from the search subspaces (Krylov subspaces). 2 DOT, 5 AXPY per step. Storage: 8 large vectors. No knowledge on properties of A is needed. Drawbacks Non-optimal Krylov subspace method. Not robust: Bi-CG may break down. Bi-CG is sensitive to evaluation errors (often loss of super-linear convergence). Convergence depends on shadow residual r 0. 2 MV needed to expand search subspace. 1 MV is by A.

Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

Hybrid Bi-CG Notation. If p k is a polynomial of exact degree k, r 0 n-vector, let S(p k, A, r 0 ) {p k (A)v v K k (A, r 0 )} Theorem. Hybrid Bi-CG find residuals r k S(p k, A, r 0 ). Example. Bi-CGSTAB: where, in every step, p k (λ) = (1 ω k λ) p k 1 (λ) ω k = minarg ω r ωar 2, where r = p k 1 (A)v, v = r Bi-CG k

Hybrid Bi-CG Notation. If p k is a polynomial of exact degree k, r 0 n-vector, let S(p k, A, r 0 ) {p k (A)v v K k (A, r 0 )} Theorem. Hybrid Bi-CG find residuals r k S(p k, A, r 0 ). Example. BiCGstab(l): p k (λ) = (1 ω k λ) p k 1 (λ) where, every lth step γ = minarg γ r [Ar,..., A l r] γ 2, where r = p k l (A)r Bi-CG k. (1 γ 1 λ... γ l λ l ) = (1 ω k λ)... (1 ω k l λ)

Induced Dimension Reduction Definition. If p k is a polynomial of exact degree k, R R 0 = [ r 1,..., r s ] an n s matrix, then S(p k, A, R) { p k (A)v v K k (A, R) }, is the p k -Sonneveld subspace. Here K k (A, R) k 1 j=0 (A ) j R γ j γ j C s. Theorem. IDR find residuals r k S(p k, A, R). Example. Bi-CGSTAB: where, in every step, p k (λ) = (1 ω k λ) p k 1 (λ) ω k = minarg ω r ωar 2, where r = p k 1 (A)v, v = r Bi-CG k

[van Gijzen Sonneveld 07] IDR Select an x 0. Select n s matrices U and R. Compute C AU. x = x 0, r b Ax, j = s, i = 1 while r > tol do Solve R C γ = R r for γ v = r C γ, s = Av j++, if j > s, ω = s v/s s, j = 0 Ue i U γ + ωv, x = x + Ue i r 0 = r, r = v ωs, Ce i = r 0 r i++, if i > s, i = 1 end while

Select n l matricex U and R Experiments suggest R = qr(rand(n, l), 0) U and C can be constructed from l steps of GCR. We will discuss IDR in more detail in Lecture 11. See also Exercise 11.1 Exercise 11.5.