Iterative Methods for Linear Systems of Equations

Similar documents
IDR(s) A family of simple and fast algorithms for solving large nonsymmetric systems of linear equations

DELFT UNIVERSITY OF TECHNOLOGY

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

1. Introduction. Krylov subspace methods are used extensively for the iterative solution of linear systems of equations Ax = b.

Fast iterative solvers

IDR(s) Master s thesis Goushani Kisoensingh. Supervisor: Gerard L.G. Sleijpen Department of Mathematics Universiteit Utrecht

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Fast iterative solvers

Krylov Space Solvers

AMS526: Numerical Analysis I (Numerical Linear Algebra)

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

Iterative Methods for Sparse Linear Systems

Iterative methods for Linear System

DELFT UNIVERSITY OF TECHNOLOGY

IDR(s) as a projection method

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

DELFT UNIVERSITY OF TECHNOLOGY

On the influence of eigenvalues on Bi-CG residual norms

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

M.A. Botchev. September 5, 2014

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

FEM and sparse linear system solving

The Lanczos and conjugate gradient algorithms

1 Conjugate gradients

Key words. linear equations, polynomial preconditioning, nonsymmetric Lanczos, BiCGStab, IDR

Lecture 8 Fast Iterative Methods for linear systems of equations

Contents. Preface... xi. Introduction...

Comparison of Fixed Point Methods and Krylov Subspace Methods Solving Convection-Diffusion Equations

Recycling Bi-Lanczos Algorithms: BiCG, CGS, and BiCGSTAB

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

KRYLOV SUBSPACE ITERATION

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

9.1 Preconditioned Krylov Subspace Methods

Computational Linear Algebra

Valeria Simoncini and Daniel B. Szyld. Report October 2009

A DISSERTATION. Extensions of the Conjugate Residual Method. by Tomohiro Sogabe. Presented to

The solution of the discretized incompressible Navier-Stokes equations with iterative methods

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

DELFT UNIVERSITY OF TECHNOLOGY

Chapter 7 Iterative Techniques in Matrix Algebra

DELFT UNIVERSITY OF TECHNOLOGY

EECS 275 Matrix Computation

Variants of BiCGSafe method using shadow three-term recurrence

Preface to the Second Edition. Preface to the First Edition

Recent computational developments in Krylov Subspace Methods for linear systems. Valeria Simoncini and Daniel B. Szyld

6.4 Krylov Subspaces and Conjugate Gradients

IDR EXPLAINED. Dedicated to Richard S. Varga on the occasion of his 80th birthday.

Last Time. Social Network Graphs Betweenness. Graph Laplacian. Girvan-Newman Algorithm. Spectral Bisection

Computational Linear Algebra

ETNA Kent State University

S-Step and Communication-Avoiding Iterative Methods

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

Recent advances in approximation using Krylov subspaces. V. Simoncini. Dipartimento di Matematica, Università di Bologna.

Krylov Subspace Methods that Are Based on the Minimization of the Residual

Performance Evaluation of GPBiCGSafe Method without Reverse-Ordered Recurrence for Realistic Problems

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17

1 Extrapolation: A Hint of Things to Come

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

A class of linear solvers built on the Biconjugate A-Orthonormalization Procedure for solving unsymmetric linear systems

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

Algorithms that use the Arnoldi Basis

Nested Krylov methods for shifted linear systems

Recycling Techniques for Induced Dimension Reduction Methods Used for Solving Large Sparse Non-Symmetric Systems of Linear Equations

Charles University Faculty of Mathematics and Physics DOCTORAL THESIS. Krylov subspace approximations in linear algebraic problems

Solving Sparse Linear Systems: Iterative methods

Solving Sparse Linear Systems: Iterative methods

arxiv: v4 [math.na] 1 Sep 2018

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

-.- Bi-CG... GMRES(25) --- Bi-CGSTAB BiCGstab(2)

Course Notes: Week 4

Solving Ax = b, an overview. Program

4.6 Iterative Solvers for Linear Systems

Preconditioning for Nonsymmetry and Time-dependence

DELFT UNIVERSITY OF TECHNOLOGY

Some minimization problems

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

Lab 1: Iterative Methods for Solving Linear Systems

Solution of eigenvalue problems. Subspace iteration, The symmetric Lanczos algorithm. Harmonic Ritz values, Jacobi-Davidson s method

Research Article Some Generalizations and Modifications of Iterative Methods for Solving Large Sparse Symmetric Indefinite Linear Systems

Since the early 1800s, researchers have

Algebraic Multigrid as Solvers and as Preconditioner

A parameter tuning technique of a weighted Jacobi-type preconditioner and its application to supernova simulations

Iterative Methods and Multigrid

Institute for Advanced Computer Studies. Department of Computer Science. Iterative methods for solving Ax = b. GMRES/FOM versus QMR/BiCG

Course Notes: Week 1

Induced Dimension Reduction method to solve the Quadratic Eigenvalue Problem

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Optimal Left and Right Additive Schwarz Preconditioning for Minimal Residual Methods with Euclidean and Energy Norms

College of William & Mary Department of Computer Science

Multigrid absolute value preconditioning

Lecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University

Solutions and Notes to Selected Problems In: Numerical Optimzation by Jorge Nocedal and Stephen J. Wright.

Iterative Linear Solvers and Jacobian-free Newton-Krylov Methods

Efficient Deflation for Communication-Avoiding Krylov Subspace Methods

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

Linear Solvers. Andrew Hazel

Transcription:

Iterative Methods for Linear Systems of Equations Projection methods (3) ITMAN PhD-course DTU 20-10-08 till 24-10-08 Martin van Gijzen 1 Delft University of Technology

Overview day 4 Bi-Lanczos method Computation of an approximate solution: Bi-CG QMR CGS and Bi-CGSTAB Convergence The induced dimension theorem and IDR(s) 2

Introduction Yesterday, we discussed Arnoldi-based methods. A method like GMRES minimises the residual norm, but has to compute and store a new orthogonal basis vector in each iteration. The cost of doing this becomes prohibitive if many iterations have to be performed. Today, we will discuss another class of iterative methods. They are based on the bi-lanczos method, and hence closely related to CG. They use short recurrence but the iterates have no optimality property. We will also discuss a new method: IDR(s). This method is not based on the bi-lanczos method, but is mathematically related. 3

The Bi-Lanczos method The symmetric Lanczos method constructs a matrix Q k such that Q T k Q k = I and Q T k AQ k = T k tridiagonal. The Bi-Lanczos algorithm constructs matrices W k and V k such that W T k AV k = T k tridiagonal and such that W k T V k = I Hence W k and V k are not orthogonal themselves, but they form a bi-orthogonal pair. 4

The Bi-Lanczos method (2) Choose two vectors v 1 and w 1 such that v1 Tw 1 = 1 β 1 = δ 1 = 0 w 0 = v 0 = 0 initialization FOR k = 1, DO iteration α k = wk TAv k ˆv k+1 = Av k α k v k β k v k 1 new ŵ k+1 = A T w k α k w k δ k w k 1 direction v δ k+1 = ˆv k+1ŵk+1 T orthogonal to β k+1 = ˆv k+1ŵk+1/δ T k+1 previous w. w k+1 = ŵ k+1 /β k+1 END FOR v k+1 = ˆv k+1 /δ k+1 5

The Bi-Lanczos method (3) Let T k = α 1 β 2 0 δ 2 α 2............ 0...... β k δ k α k and V k = [v 1 v 2... v k ] W k = [w 1 w 2... w k ]. Then AV k = V k T k + δ k+1 v k+1 e T k and A T W k = W k Tk T + β k+1w k+1 e T k. 6

The Bi-Lanczos method (3) Note that span{v 1 v k } is a basis for K k (A;v 1 ) and span{w 1 w k } is a basis for K k (A T ;w 1 ) 7

Breakdown of Bi-Lanczos Bi-Lanczos breaks down if wk+1 T v k+1 = 0. This can happen in the following cases: v k+1 = 0 : {v 1 v k } span an invariant subspace (wrt. A) w k+1 = 0 : {w 1 w k } span an invariant subspace (wrt. A T ) w k+1 0 and v k+1 0 : serious breakdown 8

Bi-CG A CG-type method can now be defined as: v 1 = r 0 / r 0 and w 1 = r 0 / r 0 (with e.g. r 0 = r 0 ). - Compute W k and V k using Bi-Lanczos - Compute x k = x 0 + V k y k with y k such that W T k AV ky k = W T k r 0 T k y k = r 0 e 1 This algorithm can be cast in a more practical form in the same way as CG. The resulting algorithm is called Bi-CG. 9

The Bi-CG algorithm r 0 = b Ax 0 Choose r 0, p 0 = r 0, p 0 = r 0 initialization FOR k = 0,1,, DO α k = rt k r k p T k Ap k x k+1 = x k + α k p k r k+1 = r k α k Ap k r k+1 = r k α k A T p k β k = rt k+1 r k+1 r T k r k p k+1 = r k+1 + β k p k p k+1 = r k+1 + β k p k update iterate update residual shadow residual update direction vector shadow direction vector END FOR 10

Properties of Bi-CG The method uses limited memory: only five vectors need to be stored; The method is not optimal. However, if A is SPD and with r 0 = r 0 we get CG back (but with two times the number of operations per iteration!) The method is finite: all the residuals are orthogonal to the shadow residuals r i Tr j = 0 if i j. The method is not robust: r k Tr k may be zero and r k 0. 11

Quasi minimising the residuals In analogy to CG and MINRES (and to FOM and GMRES) we can define a quasi -minimal residual algorithm. To this end, we first rewrite the Bi-Lanczos relations. 12

The Bi-Lanczos relations Define T k as T k = α 1 β 2 δ 2 α 2.................. β k δ k α k. δ k+1 We can now write AV k = V k+1 T k 13

Quasi minimal residuals (1) We would like to solve the problem: find x k = x 0 + V k y k such that r k is minimal. r k = b Ax k = r 0 AV k y k = r 0 v 1 AV k y k Hence the problem is to minimise r k = r 0 v 1 AV k y k (1) = r 0 V k+1 e 1 V k+1 T k y k. (2) If V k+1 had orthonormal columns this would be equivalent to: Minimise r k = r 0 e 1 T k y k 14

Quasi minimal residuals (2) However, in general V k does not have orthogonal columns. We could ignore this fact and still compute y k by minimising r 0 e 1 T k y k The resulting algorithm is called QMR. For many problems, the convergence of QMR is smoother than of Bi-CG, but not essentially faster. 15

Bi-CG wastes time Bi-CG constructs its approximations as a linear combination of the columns of V k. The only reason why we construct shadow vectors is to calculate the parameters α k and β k. Another disadvantage of Bi-CG and QMR is that operations with A T have to be performed, and this may be difficult in matrix-free computations. Next we look at the following questions: Can the computational work of Bi-CG be better exploited? Can operations with A T be avoided? 16

Bi-CG: a closer look In the Bi-CG method, the residual vector can be expressed as and the direction vector as r k = φ k (A)r 0 p k = π k (A)r 0 with φ k and π k polynomials of degree k. The shadow vectors r k and p k are defined through the same recurrences as r k and p k. Hence r k = φ k (A T ) r 0, p k = π k (A T ) r 0 17

Towards a faster algorithm The scalar α k in Bi-CG is given by α k = (φ k(a T ) r 0 ) T (φ k (A)r 0 ) (π k (A T ) r 0 ) T (Aπ k (A)r 0 ) = rt 0 φ2 k (A)r 0 r T 0 Aπ2 k (A)r 0 The idea is now to find an algorithm which gives residuals ˆr k that satisfy ˆr k = φ k 2 (A)r 0. The algorithm that is based on this idea is called CGS (by Sonneveld in 1989). 18

Conjugate Gradients Squared x 0 is an initial guess; r 0 = b Ax 0 ; r 0 arbitrary, such that r T 0 r 0 0, ρ 0 = r T 0 r 0) ; β 1 = ρ 0 ; p 1 = q 0 = 0 ; FOR i = 0,1,2,... DO u i = r i + β i 1 q i ; p i = u i + β i 1 (q i + β i 1 p i 1 ) ; ˆv = Ap i ; α i = ρ i r T 0 ˆv ; q i+1 = u i α iˆv ; END FOR s i = u i + q i+1 x i+1 = x i + α i (u i + q i+1 ) ; r i+1 = r i α i A(u i + q i+1 ) ; ρ i+1 = r T 0 r i+1 ; β i = ρ i+1 ρ i ; 19

CGS (2) For many problems CGS converges considerably faster than Bi-CG, sometimes twice as fast. However, convergence is more erratic. Bi-CG shows peaks in the convergence curves. These peaks are squared by CGS. The peaks can destroy the accuracy of the solution and also convergence of the method. The search for a more smoothly converging method has led to the development of Bi-CGSTAB by Van der Vorst. 20

Bi-CGSTAB Bi-CGSTAB is based on the following observation. Instead of squaring the Bi-CG polynomial, we can construct other iteration methods by which x k are generated so that r k = ψ k (A)φ k (A)r 0 Bi-CGSTAB takes a polynomial of the form ψ k (x) = (1 ω 1 x)(1 ω 2 x)...(1 ω k x) The ω k are chosen to minimise the current residual in the same way as in the one-step minimal residual method. 21

Bi-CGSTAB: the algorithm x 0 is an initial guess; r 0 = b Ax 0 ; r 0 arbitrary, such that r0 T r 0 0, ρ 1 = α 1 = ω 1 = 1 ; v 1 = p 1 = 0 ; FOR k = 0,1,2,... DO ρ k = r 0 Tr k ; β k 1 = (ρ k /ρ k 1 )(α k 1 /ω k 1 ) ; p k = r k + β k 1 (p k 1 ω k 1 v k 1 ) ; v k = Ap k ; α k = ρ k /( r 0 Tv k) ; s = r k α k v k ; t = As ; ω k = (t T s)/(t T t) ; x k+1 = x k + α k p k + ω k s ; r k+1 = s ω k t ; END FOR 22

Convergence of Bi-CG methods No bounds on the residual norm can be given for the four methods of today: they have no optimality property. Some general observations can be made: All four methods that we discussed today are finite. They all show superlinear convergence. The rate of convergence of Bi-CG and QMR is basically the same, but convergence of QMR is smoother. Convergence of Bi-CGSTAB and CGS is most of the time faster than of the other two methods. But convergence of CGS can be erratic which has a negative effect on the accuracy. 23

Convergence: first example The figure below shows typical convergence curves for Bi-CGSTAB, CGS, QMR and Bi-CG. 10 4 10 2 BICGSTAB CGS BI CG QMR 10 0 residual norm 10 2 10 4 10 6 10 8 0 100 200 300 400 500 600 matrix vector multiplications 24

Convergence: second example The figure below shows another example of the convergence of Bi-CGSTAB, CGS, QMR and Bi-CG. 10 10 BICGSTAB CGS BI CG QMR 10 5 residual norm 10 0 10 5 0 1000 2000 3000 4000 5000 6000 7000 matrix vector multiplications 25

BiCGSTAB2 and BiCGstab(l) For some problems, in particular for real problems with large complex eigenvalues, Bi-CGSTAB does not work well. The reason is that for such problems it is not possible to construct a residual minimising polynomial of the form ψ k (x) = (1 ω 1 x)(1 ω 2 x)...(1 ω k x). Such a polynomial has real roots, and can therefore not approximate large eigenvalues. To overcome this problem Gutknecht proposed BiCGSTAB2, which uses a polynomial with quadratic factors. Sleijpen and Fokkema proposed BiCGstab(l), which uses polynomial factors of degree l. 26

Intermezzo Of the Bi-CG methods Bi-CGSTAB is by far the most widely used. Recently, the new method IDR(s) has been proposed by Sonneveld and van Gijzen. IDR(1) and Bi-CGSTAB are mathematically equivalent. But convergence of IDR(s) is often much faster than of Bi-CGSTAB for s > 1. Although Bi-CGSTAB and IDR(1) are mathematically equivalent, IDR(s) is based on a completely different approach. 27

The IDR approach for solving Ax = b Generate residuals r n = b Ax n that are in subspaces G j of decreasing dimension. These nested subspaces are related by G j = (I ω j A)(G j 1 S) where S is a fixed proper subspace of C N, and the ω j C s are non-zero scalars. Ultimately r n {0} (IDR theorem). 28

The IDR theorem Theorem 1 (IDR) Let A be any matrix in C N N, let v 0 be any nonzero vector in C N, and let G 0 be the complete Krylov space K N (A;v 0 ). Let S denote any (proper) subspace of C N such that S and G 0 do not share a nontrivial invariant subspace of A, and define the sequence G j, j = 1,2,... as G j = (I ω j A)(G j 1 S) where the ω j s are nonzero scalars. Then (i) G j G j 1 for all j > 0. (ii) G j = {0} for some j N. 29

Making an IDR algorithm 30

Making an IDR algorithm (2) Definition of S: S can be defined as span(p 1... p s ). Let P be the matrix with p 1... p s as its columns. Then v S P H v = 0. Residual difference vectors: We compute residual difference vectors r n = r n+1 r n = A x n to update the solution vector x n+1 with the residual r n+1. 31

Making an IDR algorithm (3) Intermediate residuals Intermediate residuals r n can be generated by repeating the algorithm. Once s + 1 residuals in G j have been computed, the next residual will be in G j+1. Choice of ω Every s + 1st step a new ω may be chosen. We choose it such that the next residual is minimized in norm. 32

Prototype IDR(s) algorithm. while r n > TOL or n < MAXIT do for k = 0 to s do Solve c from P H dr n c = P H r n v = r n dr n c; t = Av; if k = 0 then end if end for end while ω = (t H v)/(t H t); dr n = dr n c ωt; dx n = dx n c + ωv; r n+1 = r n + dr n ; x n+1 = x n + dx n ; n = n + 1; dr n = (dr n 1 dr n s ); dx n = (dx n 1 dx n s ); 33

Convergence of IDR(s) The figure below shows typical convergence of Bi-CGSTAB, GMRES (optimal) and IDR(s). Convection diffusion problem 10 2 10 0 GMRES Bi CGSTAB BiCGstab(2) IDR(1) IDR(2) IDR(4) IDR(8) log 10 r 2 10 2 10 4 10 6 10 8 0 200 400 600 800 1000 1200 Number of MATVECS 34

Concluding remarks (1) Today we discussed short recurrence methods for solving nonsymmetric problems. Of these methods Bi-CGSTAB is by far the most popular. A logical question to ask is: are there better methods than Bi-CGSTAB possible that use short recurrences? The answer is: sure! For example IDR(s). However, neither Bi-CGSTAB nor IDR(s) reduce an error in an optimal way. Another question is: Is it possible to derive an optimal method for general matrices that uses short recurrences? The answer is: unfortunately not. This has been proven by Faber and Manteufel in the 90 s. 35

Concluding remarks When should we use IDR(s) and when GMRES? It is difficult to give a general answer. Rules of thumb are: Are matrix-vector multiplications (and/or preconditioning operations) expensive? GMRES is your best bet, at least if the number of iterations is limited. Are matrix-vector multiplications inexpensive and you expect that many iterations are needed (for example because you use a simple preconditioner)? Use IDR(s) (with s = 4 as a good default). 36