Topics. The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems

Similar documents
Conjugate gradient method. Descent method. Conjugate search direction. Conjugate Gradient Algorithm (294)

Summary of Iterative Methods for Non-symmetric Linear Equations That Are Related to the Conjugate Gradient (CG) Method

Some definitions. Math 1080: Numerical Linear Algebra Chapter 5, Solving Ax = b by Optimization. A-inner product. Important facts

ITERATIVE METHODS BASED ON KRYLOV SUBSPACES

Applied Mathematics 205. Unit V: Eigenvalue Problems. Lecturer: Dr. David Knezevic

6.4 Krylov Subspaces and Conjugate Gradients

EECS 275 Matrix Computation

9.1 Preconditioned Krylov Subspace Methods

AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 23: GMRES and Other Krylov Subspace Methods; Preconditioning

Lecture 11: CMSC 878R/AMSC698R. Iterative Methods An introduction. Outline. Inverse, LU decomposition, Cholesky, SVD, etc.

The Conjugate Gradient Method

Algorithms that use the Arnoldi Basis

Iterative Methods for Solving A x = b

Iterative methods for Linear System

Notes on Some Methods for Solving Linear Systems

Chapter 7 Iterative Techniques in Matrix Algebra

Lecture 17 Methods for System of Linear Equations: Part 2. Songting Luo. Department of Mathematics Iowa State University

Computational Linear Algebra

4.6 Iterative Solvers for Linear Systems

M.A. Botchev. September 5, 2014

The amount of work to construct each new guess from the previous one should be a small multiple of the number of nonzeros in A.

Krylov Space Solvers

FEM and sparse linear system solving

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

Preconditioning Techniques Analysis for CG Method

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

A DISSERTATION. Extensions of the Conjugate Residual Method. by Tomohiro Sogabe. Presented to

Lab 1: Iterative Methods for Solving Linear Systems

A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

Iterative methods for Linear System of Equations. Joint Advanced Student School (JASS-2009)

DELFT UNIVERSITY OF TECHNOLOGY

Parallel Numerics, WT 2016/ Iterative Methods for Sparse Linear Systems of Equations. page 1 of 1

KRYLOV SUBSPACE ITERATION

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Computational Linear Algebra

SOLVING SPARSE LINEAR SYSTEMS OF EQUATIONS. Chao Yang Computational Research Division Lawrence Berkeley National Laboratory Berkeley, CA, USA

Notes on PCG for Sparse Linear Systems

Iterative Methods and Multigrid

Preface to the Second Edition. Preface to the First Edition

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Numerical Methods I Non-Square and Sparse Linear Systems

Iterative Methods for Sparse Linear Systems

The Lanczos and conjugate gradient algorithms

PETROV-GALERKIN METHODS

Iterative Linear Solvers

Solving Sparse Linear Systems: Iterative methods

Solving Sparse Linear Systems: Iterative methods

Conjugate Gradient (CG) Method

7.2 Steepest Descent and Preconditioning

Numerical Methods in Matrix Computations

Iterative Methods for Linear Systems of Equations

Krylov Space Methods. Nonstationary sounds good. Radu Trîmbiţaş ( Babeş-Bolyai University) Krylov Space Methods 1 / 17

Conjugate Gradient Method

The Conjugate Gradient Method

In order to solve the linear system KL M N when K is nonsymmetric, we can solve the equivalent system

Course Notes: Week 1

Krylov Subspace Methods that Are Based on the Minimization of the Residual

Linear Solvers. Andrew Hazel

From Stationary Methods to Krylov Subspaces

AMS526: Numerical Analysis I (Numerical Linear Algebra)

ITERATIVE METHODS FOR SPARSE LINEAR SYSTEMS

OUTLINE ffl CFD: elliptic pde's! Ax = b ffl Basic iterative methods ffl Krylov subspace methods ffl Preconditioning techniques: Iterative methods ILU

The conjugate gradient method

FEM and Sparse Linear System Solving

CS137 Introduction to Scientific Computing Winter Quarter 2004 Solutions to Homework #3

Preconditioned inverse iteration and shift-invert Arnoldi method

IDR(s) as a projection method

Contribution of Wo¹niakowski, Strako²,... The conjugate gradient method in nite precision computa

Sparse matrix methods in quantum chemistry Post-doctorale cursus quantumchemie Han-sur-Lesse, Belgium

PROJECTED GMRES AND ITS VARIANTS

Contents. Preface... xi. Introduction...

Conjugate Gradient Method

Chapter 7. Iterative methods for large sparse linear systems. 7.1 Sparse matrix algebra. Large sparse matrices

Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods

Lecture 9: Krylov Subspace Methods. 2 Derivation of the Conjugate Gradient Algorithm

CONVERGENCE BOUNDS FOR PRECONDITIONED GMRES USING ELEMENT-BY-ELEMENT ESTIMATES OF THE FIELD OF VALUES

Bindel, Fall 2016 Matrix Computations (CS 6210) Notes for

1 Extrapolation: A Hint of Things to Come

Charles University Faculty of Mathematics and Physics DOCTORAL THESIS. Krylov subspace approximations in linear algebraic problems

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Math 504 (Fall 2011) 1. (*) Consider the matrices

Lecture # 20 The Preconditioned Conjugate Gradient Method

Research Article Some Generalizations and Modifications of Iterative Methods for Solving Large Sparse Symmetric Indefinite Linear Systems

The Conjugate Gradient Method

RESIDUAL SMOOTHING AND PEAK/PLATEAU BEHAVIOR IN KRYLOV SUBSPACE METHODS

ON ORTHOGONAL REDUCTION TO HESSENBERG FORM WITH SMALL BANDWIDTH

Introduction. Math 1080: Numerical Linear Algebra Chapter 4, Iterative Methods. Example: First Order Richardson. Strategy

Conjugate Gradient Method

STEEPEST DESCENT AND CONJUGATE GRADIENT METHODS WITH VARIABLE PRECONDITIONING

Key words. linear equations, polynomial preconditioning, nonsymmetric Lanczos, BiCGStab, IDR

Numerical Methods for Solving Large Scale Eigenvalue Problems

ITERATIVE PROJECTION METHODS FOR SPARSE LINEAR SYSTEMS AND EIGENPROBLEMS CHAPTER 4 : CONJUGATE GRADIENT METHOD

The Conjugate Gradient Method

Solving Ax = b, an overview. Program

AMSC 600 /CMSC 760 Advanced Linear Numerical Analysis Fall 2007 Krylov Minimization and Projection (KMP) Dianne P. O Leary c 2006, 2007.

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Iterative Solvers in the Finite Element Solution of Transient Heat Conduction

On the influence of eigenvalues on Bi-CG residual norms

Conjugate Gradients I: Setup

APPLIED NUMERICAL LINEAR ALGEBRA

Transcription:

Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 1 / 52

Conjugate gradient method Hestenes, Stiefel 1952 For A N N SPD In exact arithmetic, solves in N steps In real arithmetic No guaranteed stopping Often converges in many fewer than N steps Optimal method for minimizing with A SPD J(x) = 1 2 x t Ax x t b. 2 / 52

Descent method Given an SPD A, x 0 and a maximum number of iterations itmax r 0 = b Ax 0 for n=0:itmax ( )Choose a descent direction d n α n := arg min α J(x n + αd n ) = d n, r n / d n, Ad n x n+1 = x n + α n d n r n+1 = b Ax n+1 if converged, stop, end end 3 / 52

Descent method Given an SPD A, x 0 and a maximum number of iterations itmax r 0 = b Ax 0 for n=0:itmax ( )Choose a descent direction d n α n := arg min α J(x n + αd n ) = d n, r n / d n, Ad n x n+1 = x n + α n d n r n+1 = b Ax n+1 if converged, stop, end end Steepest descent uses d n = r n 3 / 52

Descent method Given an SPD A, x 0 and a maximum number of iterations itmax r 0 = b Ax 0 for n=0:itmax ( )Choose a descent direction d n α n := arg min α J(x n + αd n ) = d n, r n / d n, Ad n x n+1 = x n + α n d n r n+1 = b Ax n+1 if converged, stop, end end CG uses conjugate direction 3 / 52

Conjugate search direction Definition Conjugate means orthogonal in A-inner product [ ] b 2 0 A = 0 a 2 and J(x) = x T Ax Level curve is ellipse: J(x) = const or x 2 [ ] a cos θ x = b sin θ [ ] a sin θ Tangent vector t = dx dθ = b cos θ + y 2 a 2 b 2 = const x and t are conjugate: x, At A = x T At = a 2 b 2 sin θ cos θ + a 2 b 2 sin θ cos θ = 0 Good choice d n : vector conjugate to tangent vector 4 / 52

Conjugate Gradient Algorithm (294) Given an SPD A, x 0 and a maximum number of iterations itmax r 0 = b Ax 0 d 0 = r 0 for n=1:itmax α n 1 = d n 1, r n 1 / d n 1, Ad n 1 x n = x n 1 + α n 1 d n 1 r n = b Ax n if converged, stop, end β n = r n, r n / r n 1, r n 1 d n = r n + β n d n 1 end 5 / 52

Features of CG A three term recursion or a coupled two term recursion. Only three vectors are required ( cond(a) ) Worst case: O iterations per significant digit of accuracy. Common cases much faster. In exact arithmetic, CG reaches the exact solution of an N N system in N steps or less. Low FLOP count once the matrix-vector product is complete. 6 / 52

Non-SPD case You can have a short recursion (few vectors need to be stored) or you can have reasonably fast convergence. You cannot have both! 7 / 52

Example A=gallery( poisson,100); x=rand(10000,1); b=a*x; [y,flag,relres,iter]=pcg(a,b,1.e-8,10000,[],[],x); 8 / 52

Example A=gallery( poisson,100); x=rand(10000,1); b=a*x; [y,flag,relres,iter]=pcg(a,b,1.e-8,10000,[],[],x); flag = 0 iter = 0 8 / 52

Example A=gallery( poisson,100); x=rand(10000,1); b=a*x; [y,flag,relres,iter]=pcg(a,b,1.e-8,10000,[],[],x); flag = 0 iter = 0 [y,flag,relres,iter]=pcg(a,b,1.e-8,10000); Default: initial guess is zero. 8 / 52

Example A=gallery( poisson,100); x=rand(10000,1); b=a*x; [y,flag,relres,iter]=pcg(a,b,1.e-8,10000,[],[],x); flag = 0 iter = 0 [y,flag,relres,iter]=pcg(a,b,1.e-8,10000); Default: initial guess is zero. iter = 259 flag = 0 8 / 52

Example A=gallery( poisson,100); x=rand(10000,1); b=a*x; [y,flag,relres,iter]=pcg(a,b,1.e-8,10000,[],[],x); flag = 0 iter = 0 [y,flag,relres,iter]=pcg(a,b,1.e-8,10000); Default: initial guess is zero. iter = 259 flag = 0 norm(y-x)/norm(x) ans = 4.2285e-07 8 / 52

Example A=gallery( poisson,100); x=rand(10000,1); b=a*x; [y,flag,relres,iter]=pcg(a,b,1.e-8,10000,[],[],x); flag = 0 iter = 0 [y,flag,relres,iter]=pcg(a,b,1.e-8,10000); Default: initial guess is zero. iter = 259 flag = 0 norm(y-x)/norm(x) ans = 4.2285e-07 jacobi2d requires 36917 iterations 8 / 52

Speedy! [y,flag,relres,iter]=pcg(a,b,1.e-1,10000);iter [y,flag,relres,iter]=pcg(a,b,1.e-2,10000);iter [y,flag,relres,iter]=pcg(a,b,1.e-3,10000);iter [y,flag,relres,iter]=pcg(a,b,1.e-4,10000);iter [y,flag,relres,iter]=pcg(a,b,1.e-5,10000);iter [y,flag,relres,iter]=pcg(a,b,1.e-6,10000);iter [y,flag,relres,iter]=pcg(a,b,1.e-7,10000);iter [y,flag,relres,iter]=pcg(a,b,1.e-8,10000);iter 9 / 52

Speedy! [y,flag,relres,iter]=pcg(a,b,1.e-1,10000);iter iter = 3 3 [y,flag,relres,iter]=pcg(a,b,1.e-2,10000);iter iter = 15 12 more [y,flag,relres,iter]=pcg(a,b,1.e-3,10000);iter iter = 88 73 more [y,flag,relres,iter]=pcg(a,b,1.e-4,10000);iter iter = 118 30 more [y,flag,relres,iter]=pcg(a,b,1.e-5,10000);iter iter = 139 21 more [y,flag,relres,iter]=pcg(a,b,1.e-6,10000);iter iter = 195 56 more [y,flag,relres,iter]=pcg(a,b,1.e-7,10000);iter iter = 222 27 more [y,flag,relres,iter]=pcg(a,b,1.e-8,10000);iter iter = 259 37 more 9 / 52

Example Example 196: Jacobi took 35 iterations to get 2-digit accuracy. whose true solution is 2u 1 u 2 = 1 u 1 +2u 2 u 3 = 2 u 2 +2u 3 u 4 = 3 u 3 +2u 4 = 4. u 1 = 4, u 2 = 7, u 3 = 8, u 4 = 6. 10 / 52

Example results Iteration 1 2 3 4 α 1.5 0.6222 0.5357 0.4000 β 0.8750 0.2963 0.1607 0 1.5 2.6667 3.5000 x n 3. 5.3333 7.0000 4.5 8.0000 8.0000 6. 6.0000 6.0000 4.0000 7.0000 8.0000 6.0000 11 / 52

Homework Text Exercise 296, 297, Exercise G. 296 Write a program to do CG for a matrix 297 Modify previous program to do CG for MPP xercise G CG converges in at most N iterations for an N N matrix. Show that for A = I, where I is the N N identity matrix, CG converges in a single iteration, no matter how large N is! Hint: Consider the system Ix = b, where b is an arbitrary vector of length N. Starting from an arbitrary initial condition x 0, follow the CG algorithm (by hand) and show that x 1 is the exact solution. 12 / 52

What if A is not SPD Might work! alpha = d n 1, d n 1 / d n 1, Ad n 1 might divide by zero. If don t divide by zero, should converge in N steps in exact arithmetic. 13 / 52

Example: converges but A not SPD 2-1 0 0 0 0 0 0 0 0-1 2-1 0 0 0 0 0 0 0 0-1 2-1 0 0 0 0 0 0 0 0-1 2-1 0 0 0 0 0 0 0 0-1 2-1 0 0 0 0 0 0 0 0-1 -2-1 0 0 0 0 0 0 0 0-1 -2-1 0 0 0 0 0 0 0 0-1 -2-1 0 0 0 0 0 0 0 0-1 -2-1 0 0 0 0 0 0 0 0-1 -2 error=7.9038 error=3.5936 error=25.2284 error=2.8189 error=2.5492 error=1.4865 error=5.3006 error=1.059 error=2.0833 error=1.1638e-13 14 / 52

Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 15 / 52

Algorithmic options 1. An equivalent expression for α n is α n = r n, r n d n, Ad n. 2. The expression r n+1 = b Ax n+1 is equivalent to r n+1 = r n α n Ad n in exact arithmetic. Since x n+1 = x n + α n d n, multiply by A and add b to both sides to get b Ax n+1 }{{} r n+1 = b Ax n }{{} r n α n Ad n, 16 / 52

Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 17 / 52

Definitions span Let z 1,..., z m be m vectors. Then span { z 1,..., z m} is the set of all linear combinations of z 1,..., z m, i.e., the subspace span { { } z 1,..., z m} m = x = α i z i : α i R. Krylov subspace Let x 0 be given and r 0 = b Ax 0. The Krylov subspace determined by r 0 and A is i=1 X n = X n (A; r 0 ) = span{r 0, Ar 0,..., A n 1 r 0 } and the affine Krylov space determined by r 0 and A is K n = K n (A; x 0 ) = x 0 + X n = {x 0 + x : x X n }. 18 / 52

Important facts about CG, Proposition 302 The CG iterates x j, residuals r j and search directions d j satisfy x j x 0 + span{r 0, Ar 0,..., A j 1 r 0 }, r j r 0 + A span{r 0, Ar 0,..., A j 1 r 0 } and d j span{r 0, Ar 0,..., A j r 0 }. Note misprint in book! 19 / 52

Proof Proof of third result is by induction d 0 = r 0 span{r 0 } Assume d n 1 span{r 0,..., r n 1 } d n = r n + β n d n 1 span{r 0,..., r n } Proof of first and second results similar 20 / 52

Homework Exercise F: Complete the proof of Proposition 302. 21 / 52

First convergence theorem 304 Let A be SPD. Then the CG method satisfies the following: 1. The n th residual is globally optimal over the affine subspace K n in the A 1 -norm r n A 1 = min r r 0 +AX n r A 1 2. The n th error is globally optimal over K n in the A-norm e n A = 3. J(x n ) is the global minimum over K n min e e 0 +X n e A J(x n ) = min x Kn J(x). 4. Furthermore, the residuals are orthogonal and search directions are A-orthogonal: r n, r k = 0, for k n, d n, d k A = 0, for k n. 22 / 52

Prove r n, r k = d n, Ad k = 0 for k < n By induction: Vacuously true for n = 0 Assue true for n 1 First, prove r n, r n 1 and d n, d n 1 A 23 / 52

Prove r n, r k = d n, Ad k = 0 for k < n By induction: Vacuously true for n = 0 Assue true for n 1 First, prove r n, r n 1 and d n, d n 1 A Using r n = r n 1 α n 1 Ad n 1 and α n 1 = r n 1, r n 1 / d n 1, Ad n 1 r n, r n 1 = r n 1, r n 1 α n 1 Ad n 1, d n 1 = 0 23 / 52

Prove r n, r k = d n, Ad k = 0 for k < n By induction: Vacuously true for n = 0 Assue true for n 1 First, prove r n, r n 1 and d n, d n 1 A Using r n = r n 1 α n 1 Ad n 1 and α n 1 = r n 1, r n 1 / d n 1, Ad n 1 r n, r n 1 = r n 1, r n 1 α n 1 Ad n 1, d n 1 = 0 Using d n = r n + β n 1 d n 1, r n r n 1 = α n 1 Ad n 1, and β n 1 = r n, r n / r n 1, r n 1 d n, Ad n 1 = r n, Ad n 1 + β n 1 d n 1, Ad n 1 23 / 52

Prove r n, r k = d n, Ad k = 0 for k < n By induction: Vacuously true for n = 0 Assue true for n 1 First, prove r n, r n 1 and d n, d n 1 A Using r n = r n 1 α n 1 Ad n 1 and α n 1 = r n 1, r n 1 / d n 1, Ad n 1 r n, r n 1 = r n 1, r n 1 α n 1 Ad n 1, d n 1 = 0 Using d n = r n + β n 1 d n 1, r n r n 1 = α n 1 Ad n 1, and β n 1 = r n, r n / r n 1, r n 1 d n, Ad n 1 = r n, Ad n 1 + β n 1 d n 1, Ad n 1 = r n, r n r n 1 /α n 1 + β n 1 d n 1, Ad n 1 23 / 52

Prove r n, r k = d n, Ad k = 0 for k < n By induction: Vacuously true for n = 0 Assue true for n 1 First, prove r n, r n 1 and d n, d n 1 A Using r n = r n 1 α n 1 Ad n 1 and α n 1 = r n 1, r n 1 / d n 1, Ad n 1 r n, r n 1 = r n 1, r n 1 α n 1 Ad n 1, d n 1 = 0 Using d n = r n + β n 1 d n 1, r n r n 1 = α n 1 Ad n 1, and β n 1 = r n, r n / r n 1, r n 1 d n, Ad n 1 = r n, Ad n 1 + β n 1 d n 1, Ad n 1 = r n, r n r n 1 /α n 1 + β n 1 d n 1, Ad n 1 = r n, r n /α n 1 + β n 1 d n 1, Ad n 1 23 / 52

Prove r n, r k = d n, Ad k = 0 for k < n By induction: Vacuously true for n = 0 Assue true for n 1 First, prove r n, r n 1 and d n, d n 1 A Using r n = r n 1 α n 1 Ad n 1 and α n 1 = r n 1, r n 1 / d n 1, Ad n 1 r n, r n 1 = r n 1, r n 1 α n 1 Ad n 1, d n 1 = 0 Using d n = r n + β n 1 d n 1, r n r n 1 = α n 1 Ad n 1, and β n 1 = r n, r n / r n 1, r n 1 d n, Ad n 1 = r n, Ad n 1 + β n 1 d n 1, Ad n 1 = r n, r n r n 1 /α n 1 + β n 1 d n 1, Ad n 1 = r n, r n /α n 1 + β n 1 d n 1, Ad n 1 = d n 1, Ad n 1 r n, r n r n 1, r n 1 + r n, r n d n 1, Ad n 1 r n 1, r n 1 23 / 52

Prove r n, r k = d n, Ad k = 0 for k < n By induction: Vacuously true for n = 0 Assue true for n 1 First, prove r n, r n 1 and d n, d n 1 A Using r n = r n 1 α n 1 Ad n 1 and α n 1 = r n 1, r n 1 / d n 1, Ad n 1 r n, r n 1 = r n 1, r n 1 α n 1 Ad n 1, d n 1 = 0 Using d n = r n + β n 1 d n 1, r n r n 1 = α n 1 Ad n 1, and β n 1 = r n, r n / r n 1, r n 1 d n, Ad n 1 = r n, Ad n 1 + β n 1 d n 1, Ad n 1 = r n, r n r n 1 /α n 1 + β n 1 d n 1, Ad n 1 = r n, r n /α n 1 + β n 1 d n 1, Ad n 1 = d n 1, Ad n 1 r n, r n r n 1, r n 1 + r n, r n d n 1, Ad n 1 r n 1, r n 1 = 0 23 / 52

Continue proof r n, r k = d n, Ad k = 0 For the case k n 2 Using r n+1 = r n α n Ad n, and d k span{r 0, Ar 0,..., A k 1 r 0 } r n+1, r k = r n, r k α n r n, Ad k = 0 0 Using d n+1 = r n+1 + β n d n, and r k+1 r k = α k Ad k, d n+1, Ad k = r n+1, Ad k + β n d n, Ad k ) = 1 α k r n+1, (r k+1 r k ) β n d n, Ad k ) = 0 24 / 52

Prove J(x n ) = min x Kn J(x), n = 1 Induction proof: Assume n = 1, x 1 = x 0 + α 0 d 0 K 1 For arbitrary α J(x 0 + αd 0 ) = 1 2 (x 0 + αd 0 ), A(x 0 + αd 0 ) b, (x 0 + αd 0 ) 25 / 52

Prove J(x n ) = min x Kn J(x), n = 1 Induction proof: Assume n = 1, x 1 = x 0 + α 0 d 0 K 1 For arbitrary α J(x 0 + αd 0 ) = 1 2 (x 0 + αd 0 ), A(x 0 + αd 0 ) b, (x 0 + αd 0 ) = 1 2 x 0, Ax 0 + α x 0, Ad 0 + 1 2 α2 d 0, Ad 0 b, x 0 α b, d 0 25 / 52

Prove J(x n ) = min x Kn J(x), n = 1 Induction proof: Assume n = 1, x 1 = x 0 + α 0 d 0 K 1 For arbitrary α J(x 0 + αd 0 ) = 1 2 (x 0 + αd 0 ), A(x 0 + αd 0 ) b, (x 0 + αd 0 ) = 1 2 x 0, Ax 0 + α x 0, Ad 0 + 1 2 α2 d 0, Ad 0 b, x 0 α b, d 0 = J(x 0 ) + α Ax 0 b, d 0 + 1 2 α2 d 0, Ad 0 25 / 52

Prove J(x n ) = min x Kn J(x), n = 1 Induction proof: Assume n = 1, x 1 = x 0 + α 0 d 0 K 1 For arbitrary α J(x 0 + αd 0 ) = 1 2 (x 0 + αd 0 ), A(x 0 + αd 0 ) b, (x 0 + αd 0 ) = 1 2 x 0, Ax 0 + α x 0, Ad 0 + 1 2 α2 d 0, Ad 0 b, x 0 α b, d 0 = J(x 0 ) + α Ax 0 b, d 0 + 1 2 α2 d 0, Ad 0 = J(x 0 ) α r 0, d 0 + 1 2 α2 d 0, Ad 0 25 / 52

Prove J(x n ) = min x Kn J(x), n = 1 Induction proof: Assume n = 1, x 1 = x 0 + α 0 d 0 K 1 For arbitrary α J(x 0 + αd 0 ) = 1 2 (x 0 + αd 0 ), A(x 0 + αd 0 ) b, (x 0 + αd 0 ) This is minimized when = 1 2 x 0, Ax 0 + α x 0, Ad 0 + 1 2 α2 d 0, Ad 0 b, x 0 α b, d 0 = J(x 0 ) + α Ax 0 b, d 0 + 1 2 α2 d 0, Ad 0 = J(x 0 ) α r 0, d 0 + 1 2 α2 d 0, Ad 0 r 0, d 0 + α d 0, Ad 0 = 0 25 / 52

Prove J(x n ) = min x Kn J(x), n = 1 Induction proof: Assume n = 1, x 1 = x 0 + α 0 d 0 K 1 For arbitrary α J(x 0 + αd 0 ) = 1 2 (x 0 + αd 0 ), A(x 0 + αd 0 ) b, (x 0 + αd 0 ) This is minimized when = 1 2 x 0, Ax 0 + α x 0, Ad 0 + 1 2 α2 d 0, Ad 0 b, x 0 α b, d 0 = J(x 0 ) + α Ax 0 b, d 0 + 1 2 α2 d 0, Ad 0 = J(x 0 ) α r 0, d 0 + 1 2 α2 d 0, Ad 0 r 0, d 0 + α d 0, Ad 0 = 0 So α = α 0 yields a minimum over K 1. 25 / 52

Prove J(x n ) = min x Kn J(x), induction step For x = x n 1 + αd n 1, previous calculation yields α n = r 0, d 0 /α d 0, Ad 0 so x n minimizes J over x n 1 + αd n 1. 26 / 52

Prove J(x n ) = min x Kn J(x), induction step For x = x n 1 + αd n 1, previous calculation yields α n = r 0, d 0 /α d 0, Ad 0 so x n minimizes J over x n 1 + αd n 1. To see that x n actually minimizes J over K n, suppose ỹ K n. Write ỹ = x n + y, so y X n. Computing J(x n + y) = J(x n + x n, Ay + 1 y, Ay b, y 2 = J(x n ) + Ax n b, y + 1 y, Ay 2 = J(x n ) r n, y + 1 y, Ay 2 26 / 52

Prove J(x n ) = min x Kn J(x), induction step For x = x n 1 + αd n 1, previous calculation yields α n = r 0, d 0 /α d 0, Ad 0 so x n minimizes J over x n 1 + αd n 1. To see that x n actually minimizes J over K n, suppose ỹ K n. Write ỹ = x n + y, so y X n. Computing J(x n + y) = J(x n + x n, Ay + 1 y, Ay b, y 2 = J(x n ) + Ax n b, y + 1 y, Ay 2 = J(x n ) r n, y + 1 y, Ay 2 But y X n = span{r 0,..., A n 1 r 0 } = span{r 0,..., r n 1 }, so that r, y = 0 because r n, r k = 0 for k < n. Hence J(x n + y) = J(x n ) + 1 2 y, Ay > J(x n ) unless y = 0. 26 / 52

Prove J(x n ) = min x Kn J(x), induction step For x = x n 1 + αd n 1, previous calculation yields α n = r 0, d 0 /α d 0, Ad 0 so x n minimizes J over x n 1 + αd n 1. To see that x n actually minimizes J over K n, suppose ỹ K n. Write ỹ = x n + y, so y X n. Computing J(x n + y) = J(x n + x n, Ay + 1 y, Ay b, y 2 = J(x n ) + Ax n b, y + 1 y, Ay 2 = J(x n ) r n, y + 1 y, Ay 2 But y X n = span{r 0,..., A n 1 r 0 } = span{r 0,..., r n 1 }, so that r, y = 0 because r n, r k = 0 for k < n. Hence J(x n + y) = J(x n ) + 1 2 y, Ay > J(x n ) unless y = 0. so that J(x n ) is a minimum over all K n. 26 / 52

Finite termination Let A be SPD. Then in exact arithmetic CG produces the exact solution to an N N system in N steps or fewer. Proof. Since the residuals {r 0, r 1,..., r N 1 } are orthogonal they are linearly independent. Thus, r l = 0 for some l N. 27 / 52

Remark on development Sections 6.2 and 6.3 in the text present a way to develop the CG method. We will be skipping it because of lack of time. 28 / 52

Convergence rate of CG Notation: The set of real polynomials of degree n is denoted Π n. Theorem 327 Let A be SPD. The error at CG step n is bounded by ( ) x x n A min max p(x) e 0 A λ min x λ max Theorem 329 Given any ε > 0 for p n Π n and p(0)=1 n 1 2 cond(a) ln( 2 ε ) + 1 the error in the CG iterations is reduced by ε: x n x A ε x 0 x A. 29 / 52

Idea of proof 1. e n e 0 + X n = e n = (polynomial in A)e 0 2. e n is optimal 3. Chebychev polynomials satisfy known bounds 4. CG must be no worse than Chebychev bounds. 30 / 52

Polynomial bounds r n r 0 + A(span{r 0, Ar 0,..., A n 1 r 0 }) e n e 0 + span{e 0, Ae 0,..., A n 1 e 0 } e n = [I + a 1 A + a 2 A 2 + + a n A n ]e 0 = p(a)e 0 e n A = min p n Π n and p(0)=1 p(a)e0 A ( ) min p(a) A e 0 A. p n Π n and p(0)=1 p(a) A = max p(x) max p(x). λ spectrum(a) λ min x λ max 31 / 52

Chebychev polynomials, min-max problem The Chebychev polynomials T n (x) = cos ( n cos 1 (x) ) can be scaled and translated to [a, b] (where a = λ min, b = λ max ) ( ) ( ) b + a 2x b + a p n /T n. b a b a p n (x) are known to attain Hence, min p n Π n p(0) = 1 min p n Π n p(0) = 1 max p(x) = max a x b a x b 1 where σ = 1 + a b a b max p(x) λ min x λ max T n ( b+a 2x b a ) T n ( b+a b a ) = 2 σn 1 + σ n ( ) κ 1 1 = = 1 2 + O( 1 κ + 1 κ κ ) 32 / 52

Scaled Chebychev polynomials 1 max p(x) p(x) λ min min p(x) λ max 33 / 52

Chebychev polynomials do_cheby.m 34 / 52

How many iterations? How many iterations to get 2σ n /(1 + σ n ) ɛ? ( ) 1 σ 1 2 κ σ n ɛ/2 n log σ log(ɛ/2) n( 2 1/κ) log(ɛ/2) κ n 2 log 2 ɛ + 1 35 / 52

Polynomial error e n = [I + a 1 A + a 2 A 2 + + a n A n ]e 0 = p(a)e 0 Repeated eigenvalues are treated as single eigenvalue Accelerates convergence K distinct eigenvalues = convergence in K iterations. Recall Exercise G. Clusters of eigenvalues speed up convergence 36 / 52

Preconditioning Instead of solving Ax = b, solve MAx = Mb. M = A 1, converges in 1 iteration. M A 1, but computing Mx is fast M L L, approximate factors of A A few iterations of another method (such as Gauss-Seidel) Universe of alternatives 37 / 52

PCG Algorithm for solving Ax = b Given a SPD matrix A, preconditioner M, initial guess vector x 0, right side vector b, and maximum number of iterations itmax r 0 = b Ax 0 Solve Md 0 = r 0 z 0 = d 0 for n=0:itmax α n = r n, z n / d n, Ad n x n+1 = x n + α n d n r n+1 = b Ax n+1 ( ) if converged, stop end Solve Mz n+1 = r n+1 β n+1 = r n+1, z n+1 / r n, z n d n+1 = z n+1 + β n+1 d n ( ) end 38 / 52

Example Write A = UU, where U is upper triangular Cholesky factorization Factors have fill-in (curse of dimensionality) Only keep nonzeros where A is nonzero: Incomplete Cholesky 39 / 52

Example N=50; A=gallery( poisson,n); xact=sin(1:n^2) ; b=a*xact; tol=1.e-6; maxit=n^2; 40 / 52

Example N=50; A=gallery( poisson,n); xact=sin(1:n^2) ; b=a*xact; tol=1.e-6; maxit=n^2; tic;[x,flag,relres,iter0] = pcg(a,b,tol,maxit);toc 40 / 52

Example N=50; A=gallery( poisson,n); xact=sin(1:n^2) ; b=a*xact; tol=1.e-6; maxit=n^2; tic;[x,flag,relres,iter0] = pcg(a,b,tol,maxit);toc iter0 >> Elapsed time is 0.023921 seconds. >> iter0 = 73 40 / 52

Example N=50; A=gallery( poisson,n); xact=sin(1:n^2) ; b=a*xact; tol=1.e-6; maxit=n^2; tic;[x,flag,relres,iter0] = pcg(a,b,tol,maxit);toc iter0 >> Elapsed time is 0.023921 seconds. >> iter0 = 73 U=chol(A); U(A==0)=0; tic;[x,flag,relres,iter] = pcg(a,b,tol,maxit,u,u);toc 40 / 52

Example N=50; A=gallery( poisson,n); xact=sin(1:n^2) ; b=a*xact; tol=1.e-6; maxit=n^2; tic;[x,flag,relres,iter0] = pcg(a,b,tol,maxit);toc iter0 >> Elapsed time is 0.023921 seconds. >> iter0 = 73 U=chol(A); U(A==0)=0; tic;[x,flag,relres,iter] = pcg(a,b,tol,maxit,u,u);toc iter >> Elapsed time is 0.011928 seconds. >> iter = 19 40 / 52

Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 41 / 52

GMRES is most popular Youcef Saad and Martin H. Schultz, GMRES: A Generalized Minimal Residual Algorithm for Solving Nonsymmetric Linear Systems, SIAM J. Sci. and Stat. Comput. 7, pp. 856-869. x n is the vector that minimizes Ax b over the Krylov space K n = span{r (0), Ar (0), A 2 r (0),..., A n 1 r (0) }. The basis for K n is formed according to a modified Gram-Schmidt orthogonalization, called an Arnoldi process. w (n) = Av (n) for k = 1,..., n w (n) = w (n) (w (n), v (k) )v (k) end v (n+1) = w (n) / w (n) 42 / 52

Features of GMRES A not SPD = no 3-term relation All of the basis vectors must be kept Usually restart whole thing after m steps: GMRES(m). Yousef Saad, Iterative Methods for Sparse Linear Systems, Second edition, SIAM, 2003, www-users.cs.umn.edu/%7esaad/itermethbook_2nded.pdf 43 / 52

Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 44 / 52

Conjugate gradient squared (cgs) Conjugate gradient applied to A H Ax = A H b Hermitian is transpose plus complex conjugate Convergence behavior can be irregular Two (not independent) matrix-vector multiplies per iteration A H A: condition number is squared Works on general matrices, even complex 45 / 52

Conjugate gradient applied to normal equations (cgn) Conjugate gradient applied to A H Ax = A H b directly A H is formed Two matrix-vector multiplies per iteration 46 / 52

minres and symmlq Variants of the CG method for indefinite systems MINRES minimizes the residual in the 2-norm. SYMMLQ solves the projected system, but does not minimize anything. SYMMLQ keeps the residual orthogonal to all previous ones. SYMMLQ uses LQ decomposition to solve an intermediate system 47 / 52

Bi-conjugate gradient (bicg) r (n) = r (n 1) α n Ap (n) p (n) = r (n 1) + β n 1 p (n 1) r (n) = r (n 1) α n A T p (n) p (n) = r (n 1) + β n 1 p (n 1) α n = r (n 1),r (n 1) p (n),ap (n) β n = r (n),r (n) r (n 1),r (n 1) p (n) are conjugate-orthogonal to p (k) No minimization principle = irregular convergence Two matrix-vector products per iteration, one is with A T Works for non-symmetric matrices Can break down 48 / 52

Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 49 / 52

GMRES x n is the vector that minimizes Ax b over the Krylov space K n = span{r (0), Ar (0), A 2 r (0),..., A n 1 r (0) }. Must keep all basis vectors for K n Restart to keep history size under control 50 / 52

Topics The CG Algorithm Algorithmic Options CG s Two Main Convergence Theorems What about non-spd systems? Methods requiring small history Methods requiring large history Summary of solvers 51 / 52

How do I choose? Symmetric? yes Definite? yes CG no MINRES or SYMMLQ or CR no GMRES or CGN Best to use no preconditioner at first good methods will be better when preconditioned. 52 / 52