Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

Similar documents
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Iterative Methods for Linear Systems of Equations

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Computational Linear Algebra

Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

Iterative Methods for Sparse Linear Systems

IDR(s) as a projection method

FEM and sparse linear system solving

Computational Linear Algebra

Algorithms that use the Arnoldi Basis

PROJECTED GMRES AND ITS VARIANTS

M.A. Botchev. September 5, 2014

A DISSERTATION. Extensions of the Conjugate Residual Method. by Tomohiro Sogabe. Presented to

Iterative methods for Linear System

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

KRYLOV SUBSPACE ITERATION

Contents. Preface... xi. Introduction...

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Krylov Space Solvers

ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Numerical Methods in Matrix Computations

Krylov Subspace Methods that Are Based on the Minimization of the Residual

Solving Symmetric Indefinite Systems with Symmetric Positive Definite Preconditioners

Fast iterative solvers

Recent computational developments in Krylov Subspace Methods for linear systems. Valeria Simoncini and Daniel B. Szyld

DELFT UNIVERSITY OF TECHNOLOGY

Generalized MINRES or Generalized LSQR?

On the influence of eigenvalues on Bi-CG residual norms

EECS 275 Matrix Computation

The Lanczos and conjugate gradient algorithms

6.4 Krylov Subspaces and Conjugate Gradients

RESIDUAL SMOOTHING AND PEAK/PLATEAU BEHAVIOR IN KRYLOV SUBSPACE METHODS

Block Bidiagonal Decomposition and Least Squares Problems

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

Charles University Faculty of Mathematics and Physics DOCTORAL THESIS. Krylov subspace approximations in linear algebraic problems

1 Conjugate gradients

Steady-State Optimization Lecture 1: A Brief Review on Numerical Linear Algebra Methods

Some minimization problems

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17

Research Article Some Generalizations and Modifications of Iterative Methods for Solving Large Sparse Symmetric Indefinite Linear Systems

Preface to the Second Edition. Preface to the First Edition

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

On the Superlinear Convergence of MINRES. Valeria Simoncini and Daniel B. Szyld. Report January 2012

AMS526: Numerical Analysis I (Numerical Linear Algebra)

9.1 Preconditioned Krylov Subspace Methods

Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computa

Key words. linear equations, polynomial preconditioning, nonsymmetric Lanczos, BiCGStab, IDR

Comparison of Fixed Point Methods and Krylov Subspace Methods Solving Convection-Diffusion Equations

Lecture 8 Fast Iterative Methods for linear systems of equations

IDR(s) Master s thesis Goushani Kisoensingh. Supervisor: Gerard L.G. Sleijpen Department of Mathematics Universiteit Utrecht

On the loss of orthogonality in the Gram-Schmidt orthogonalization process

A stable variant of Simpler GMRES and GCR

1 Extrapolation: A Hint of Things to Come

Institute for Advanced Computer Studies. Department of Computer Science. Iterative methods for solving Ax = b. GMRES/FOM versus QMR/BiCG

Variants of BiCGSafe method using shadow three-term recurrence

Recycling Bi-Lanczos Algorithms: BiCG, CGS, and BiCGSTAB

Communication-avoiding Krylov subspace methods

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

Chapter 7 Iterative Techniques in Matrix Algebra

ON THE GLOBAL KRYLOV SUBSPACE METHODS FOR SOLVING GENERAL COUPLED MATRIX EQUATIONS

ETNA Kent State University

Iterative Linear Solvers

GMRES: Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems

Lab 1: Iterative Methods for Solving Linear Systems

Block Krylov Space Solvers: a Survey

Preconditioned Conjugate Gradient-Like Methods for. Nonsymmetric Linear Systems 1. Ulrike Meier Yang 2. July 19, 1994

Krylov subspace iterative methods for nonsymmetric discrete ill-posed problems in image restoration

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition

S-Step and Communication-Avoiding Iterative Methods

Performance Evaluation of GPBiCGSafe Method without Reverse-Ordered Recurrence for Realistic Problems

Introduction. Chapter One

Multigrid absolute value preconditioning

c 2005 Society for Industrial and Applied Mathematics

Linear System of Equations

Key words. conjugate gradients, normwise backward error, incremental norm estimation.

Algebra C Numerical Linear Algebra Sample Exam Problems

Iterative Methods for Solving A x = b

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

APPLIED NUMERICAL LINEAR ALGEBRA

On Solving Large Algebraic. Riccati Matrix Equations

Course Notes: Week 1

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU

Course Notes: Week 4

arxiv: v4 [math.na] 1 Sep 2018

Introduction to Iterative Solvers of Linear Systems

Fast iterative solvers

Gradient Method Based on Roots of A

Research Article Residual Iterative Method for Solving Absolute Value Equations

On the Preconditioning of the Block Tridiagonal Linear System of Equations

Laboratoire d'informatique Fondamentale de Lille

On prescribing Ritz values and GMRES residual norms generated by Arnoldi processes

Residual statics analysis by LSQR

DELFT UNIVERSITY OF TECHNOLOGY

Transcription:

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method Leslie Foster 11-5-2012 We will discuss the FOM (full orthogonalization method), CG, GMRES (generalized minimal residual), BICG (bi-conjugate gradient), QMR (quasi-minimal residual), CGS (conjugate gradient squared) BICGSTAB (bi-conjugate gradient stabilized) and LSQR (least squares QR) methods A good reference for these iterative methods is Iterative Methods for Sparse Linear Systems, second edition, SIAM Press, 2003 by Yousef Saad. Much of this summary is motivated by Saad s discussion. Suppose that we want to solve Ax = b for an n by n matrix A. All of the above methods are related to the following approach: 0. Let K m and L m be two m dimensional subspaces of normal n dimensional space. Let us approximate the solution to Ax = b by choosing x such that x x 0 is in K m. We will write this as x = x 0 + K m. Select a particular such x so that b Ax is orthogonal to the space L m. The first four methods above can be derived by using different choices of K m and L m. The other methods are variants of the first four methods. 1

FOM Method (A is non-symmetric) Let K m = K m (A, r 0 ) = Span(r 0, Ar 0, A 2 r 0,..., A m 1 r 0 ) and L m = K m where x 0 is an initial guess to the solution to Ax = b and r 0 = b Ax 0. The subspace of the form Span(v, Av, A 2 v,..., A m 1 v) for some vector v is called a Krylov subspace. Selection of a Krylov subspace leads to nice algebra in all of the methods mentioned below. The full orthogonalization method selects an orthogonal basis for K m and uses this basis to help select x according to the Principle (0). The algorithm that selects an orthogonal basis for a Krylov subspace is called Arnoldi s method. CG Method (A is symmetric) The CG method assumes that A is symmetric. We let K m = K m (A, r 0 ) = span(r 0, Ar 0, A 2 r 0,..., A m 1 r 0 ) and L m = K m. Therefore the CG method is the FOM method applied to a symmetric matrix A. Some details of the CG method: We need an orthogonal basis for K m. We will construct the basis sequentially. Suppose that v 1, v 2,..., v m is an orthogonal basis for K m and let V m be the n by m matrix = (v 1, v 2,..., v m ). Problem How can we get v m+1 in K m+1 that is perpendicular to v 1, v 2,..., v m? Solution 1 First note that since v m Span(r 0, Ar 0,..., A m 1 r 0 ) then Av m is in Span(Ar 0,..., A m r 0 ) K m+1 2. Consider the problem min Av y m V m y Let u m+1 = (Av m ) V m y = the residual to the problem. Now (a) u m+1 is in K m+1 since by (1) Av m is in K m+1 and by definition the columns of V m are in K m K m+1 (b) u m+1 is perpendicular to v 1,..., v m since u m+1 is the residual to the least squares problem with matrix V m. (Recall that the residual is perpendicular to the subspace formed by the columns of V m.) 3. Let v m = u m+1 / u m+1 By (a) and (b) in (2), v m+1 solves the above problem 2

Theorem If A is symmetric then the solution vector y to min (Av m ) V m y has y y i = 0 for i = 1, 2,..., m 2 Proof By normal equations, (V T mv m )y = V T m(av m ). However since V m is an orthogonal matrix, V T mv m = I and so y = V T m(av m ). Since A is symmetric, A = A T or y = V T m(a T v m ). Since V m has columns v 1, v 2,..., v m we get y i = v T i AT v m = (v T i AT v m ) T = v T m(av i ). But we chose v m to be perpendicular to K m 1. Also v i K i so Av i K i+1. Therefore as long as i + 1 m 1 or i m 2, v m is perpendicular to Av 1. Consequently y i = 0, i = 1, 2,..., m 2. Corollary u m+1 = Av m V m y = Av m v m 1 y m 1 v m y m Since v m+1 = u m+1 / u m+1 we have 4. v m+1 }{{} = 1 u m+1 (Av m y m v m ) }{{} 1 y m 1 v u m+1 m 1 }{{} Equation (4) is called a 3-term recurrence relation - we can get v m+1 terms of just two prior vectors: v m and v m 1 in CG can be implemented quite efficiently since it has a 3-term recurrence. If A is non-symmetric there is no 3 term recurrence (the proof of (4) relied critically on A = A T ). Therefore the FOM method (for non-symmetric matrices) is said to have long recurrences. for cg 10n flops + one Ax 3

GMRES (A is non-symmetric) Let K m = K m (A 1 r 0 ) = Span(r 0, Ar 0,..., A m 1 r 0 ) and L m = AK m = Span(Ar 0, A 2 r 0,..., A m r 0 ) Theorem For all vectors of the form x = x 0 + K m, GMRES selects the one with r = b Ax minimal. Proof Let v 1,..., v m be an orthogonal basis for K m and V m = (v 1, v 2,..., v m ). Then any x K m can be written as x = V m y. Therefore r = b Ax = b A(x 0 + V m y) = (b Ax 0 ) AV m y = r 0 AV m y or r = r 0 By with B = (AV m ) and min r = min r 0 By. Therefore (by least squares geometry) r is perpendicular to the subspace formed by the columns of B. But columns of B are Av 1, Av 2,..., Av m. Since v 1, v 2,..., v m K m = Span(r 0,..., A m 1 r 0 ) then Av 1, Av 2,..., Av m Span(Ar 0, A 2 r 0,..., A m r 0 ) = L m. Therefore b Ax is perpendicular to L m The usual implementation of GMRES constructs an orthogonal basis for K m. If A is non-symmetric there are not short recurrences to generate this basis. The GMRES method has long recurrences. Corollary The amount of work per step for GMRES increases as m increases. at step i GMRES(m) 4(i + 1)n flops + one Ax As the number of steps increase for GMRES the work per step can become prohibitive. The GMRES(m) idea is, after m steps, to simply start GMRES over again with the current x as the new x 0 4

BICG (A is non-symmetric) Let K m = Span(r 0, Ar 0,..., A m 1 r 0 ) and L m = Span(r 0, A T r 0,..., (A T ) m 1 r 0 ) In implementing the method with this K m and L m it is particularly convenient to construct a basis (v 1,..., v m ) for K m and another basis (w 1,..., w m ) for L m such that for i j, v T j w i = 0, but the v s are not orthogonal and the w s are also not orthogonal. Such a pair of sequences is called bi-orthogonal. The usual method of constructing the bi-orthogonal basis is called the Lanczos method. The bi-orthogonal sequence can be constructed with 3-term recurrences. However, breakdown (for example a division by 0 in the algorithm) or convergence difficulties are more likely. QMR 14n flops + one A and one A T x Using the basis generated by BICG, one can show that b Ax = V m+1 (βe 1 T m y) where e 1 = (1, 0,..., 0) T, β = scalar, T m is tridiagonal and V m+1 is the matrix formed by the basis vectors for K m+1 in the Lanczos process. If V m+1 were orthogonal, then b Ax = βe 1 T m y This last problem can be solved efficiently since T m is tridiagonal. Since V m+1 is not orthogonal, we are not truly minimizing b Ax if we minimize βe 1 T m y. Therefore if we minimize βe 1 T m y we get a quasi-minimal residual (QMR). The advantage of QMR is that we get a 3-term recurrance and an efficient algorithm. A disadvantage is that we are not truly minimizing the residual b Ax of the original problem. 24n flops + one Ax + one A T x 5

CGS Note that BICG requires multiplying by both A and A T, and for some applications this is not desirable. Looking at the relationships and formulas in BICG it is possible to develop a method similar to BICG except that at each step, rather than multiplying by A and A T, we multiply by A twice. This is the idea of the CGS method. The method is often faster (about twice?) than BICG, but is more sensitive to rounding errors, and for some examples has erratic convergence. BICGSTAB 16n flops + two Ax products This method again begins with BICG and looks for ways to avoid the use of A T. In this case a modification of BICG is developed that has a free parameter which is chosen with the intent of reducing the likelihood of erratic convergence that is sometimes exhibited by BICG. LSQR 20n flops + two Ax products To solve Ax = b for a nonsymmetric square matrix or to solve the least squares problem min b Ax for a rectangular matris A one can solve the normal equations A T Ax = Bx = c = A T b. B = A T A is symmetric positive definite (assuming that the column of A are linearly independent) and thus one can solve the normal equations using the conjugate gradient method (sometimes this is called the CGNE method). However, there is a critical difference between solving the normal equations and working with Ax = b (for a square A) directly: cond(b) = cond(a T A) = [cond(a)] 2 where the condition numbers are calculated using the usual two or Euclidean norm. This leads to two important problems: 1. Computer arithmetic errors can be larger 2. the convergence rate of the iterative method can be slower. To overcome, in part, these difficulties with CGNE, the LSQR method works directly on A rather than A T A. In addition, if A is sufficiently ill-conditioned, LSQR assumes that A is exactly singular and find the solution to min B Ax with x minimal. These features make LSQR one of the most robust method for singular or nearly singular systems. 6

16n flops + Ax and A T x products All the methods based on the BICG (or Lanczos) process can encounter breakdown where the algorithm cannot be continued, generally due to division by 0. CG for s.p.d. matrices or GMRES for a non-symmetric matrix will not, in exact arithmetic, encounter breakdown prior to solving the problem, but can have trouble due to rounding errors. If no breakdown occurs, each of the methods (except GMRES(m) with m < n) will converge in exact arithmetic in n or fewer steps. 7