Utrecht, 15 november 2017 Fast iterative solvers Gerard Sleijpen Department of Mathematics http://www.staff.science.uu.nl/ sleij101/
Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a general square matrix A. Arnoldi decomposition: AV k = V k+1 H k with V k orthonormal, H k Hessenberg. H k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. FOM: r k V k H k y k = e 1 x k = V k (Uk 1 k e 1)) GMRES: y k = argmin y b AV k y 2 = argmin y e 1 H k y 2 x k = V k (R 1 k (Q k e 1 )) The columns of V k span K k (A, r 0 ) {p(a)r 0 p P k 1 }. K k (A, r 0 ) = span(r 0, r 1,..., r k 1 ).
Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) Note. Short recurrences (eff. comp.) because of T k tri-diagonal, and U k U k = V k (CG) and W k R k = V k (MINRES)
Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) Details. For MINRES, see Exercise 7.8
Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 Note. Less stable because x k = (V k R 1 k )(Q k e 1 ) we rely on math. for orth. of V k (CG & MINRES) W k R k = V k (MINRES) is a three term vector recurrence for the w k s.
Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) Note. If A is positive definite, then CG minimizes as well: y k = argmin y x V k y A = argmin y b AV k y A 1
Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) SYMMLQ. Take x k = AV k y k with y k argmin y x AV k y 2 e 1 T k T k y k = 0 x k = V k+1 T k (T k T k ) 1 e 1 = (V k+1 Q k )((R k ) 1 e 1 )
Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) Details. For SYMMLQ, see Exercise 7.9.
FOM residual polynomials and Ritz values Property. r CG k = r FOM k = p k (A)r 0 V k, for some residual polynomial p k of degree k, i.e., p k (0) = 1. p FOM k = p k is the kth (CG or) FOM residual polynomial. Theorem. p FOM (ϑ) = 0 H k y = ϑ y for some y 0: the zeros of p FOM k In particular, are precisely the kth order Ritz values. ) (1 λϑj p FOM k (λ) = k j=1 (λ C). Moreover, for a polynomial p of degree at most k with p(0) = 1, we have that that Proof. Exercise 6.6. p = p FOM k p(h k ) = 0.
FOM residual polynomials and Ritz values Property. r CG k = r FOM k = p k (A)r 0 V k, for some residual polynomial p k of degree k, i.e., p k (0) = 1. p FOM k = p k is the kth (CG or) FOM residual polynomial. Theorem. p FOM (ϑ) = 0 H k y = ϑ y for some y 0: the zeros of p FOM k In particular, are precisely the kth order Ritz values. ) (1 λϑj p FOM k (λ) = k j=1 (λ C). Moreover, for a polynomial p of degree at most k with p(0) = 1, we have that that p = p FOM k p(h k ) = 0. Theorem. Similarly relate zeros of p GMRES k Ritz values of H k. to harmonic
Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues
Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues
Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues
Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues
Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues
with A n n non-singular. Ax = b Today s topic. Iterative methods for general systems using short recurrences
Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR
Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR
A = A > 0, Conjugate Gradient [Hestenes Stiefel 52] x = 0, r = b, u = 0, ρ = 1 While r > tol do σ = ρ, ρ = r r, β = ρ/σ u r β u, c = Au σ = u c, α = ρ/σ r r α c x x + α u end while
Construction CG. There are four alternative derivations of CG. GCR (use A = A) CR use A 1 inner product + efficient implementation. Lanczos + T = LU + efficient implementation. Orthogonalize residuals. Nonlinear CG to solve x = argmin x 1 2 b A x 2 A 1... [Exercise 7.3]
Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k
Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, Ku k K k+1 (AK 1, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (AK 1, r 0 ) If r k, Au k K 1 r k 1, then r k, Au k K 1 K k (AK 1, r 0 )
Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, Ku k K k+1 (AK 1, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (AK 1, r 0 ) If r k, Au k K 1 r k 1, then r k, Au k K 1 K k (AK 1, r 0 ) Proof. r k = r k 1 α k 1 A u k 1 K 1 r k 1 r k = r k 1 α k 1 A u k 1 K 1 K k 1 (AK 1, r 0 ) by construction α k 1 by induction
Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, Ku k K k+1 (AK 1, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (AK 1, r 0 ) If r k, Au k K 1 r k 1, then r k, Au k K 1 K k (AK 1, r 0 ) Proof. A u k = AK 1 r k β k A u k 1 K 1 r k 1 A u k = AK 1 r k β k A u k 1 K 1 K k 1 (AK 1, r 0 ) by construction β k by induction: AK 1 r k K 1 K k 1 (AK 1, r 0 ) r k K 1 AK 1 K k 1 (AK 1, r 0 ) r k K 1 K k (AK 1, r 0 )
Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, Ku k K k+1 (AK 1, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (AK 1, r 0 ) If r k, Au k K 1 r k 1, then r k, Au k K 1 K k (AK 1, r 0 ) Proof. A u k = AK 1 r k β k A u k 1 K 1 r k 1 A u k = AK 1 r k β k A u k 1 K 1 K k 1 (AK 1, r 0 ) by construction β k by induction: AK 1 r k K 1 K k 1 (AK 1, r 0 ) r k K 1 AK 1 K k 1 (AK 1, r 0 ) r k K 1 K k (AK 1, r 0 )
A = A & K = K: Preconditioned CG x = 0, r = b, u = 0, ρ = 1 While r > tol do Solve Kc = r for c σ = ρ, ρ = c r, β = ρ/σ u c β u, c = Au σ u c, α = ρ/σ r r α c x x + α u end while
Properties CG Pros Low costs per step: 1 MV, 2 DOT, 3 AXPY to increase dimension Krylov subspace by one. Low storage: 5 large vectors (incl. b). Minimal res. method if A, K pos. def.: r k A 1 is min. Orthogonal residual method if A = A, K = K: r k K 1 K k (AK 1 ; r 0 ). No additional knowledge on properties of A is needed. Robust: CG always converges if A, K pos. def.. Cons May break down if A = A 0. Does not work if A A. CG is sensitive to evaluation errors if A = A 0. Often loss of a) super-linear conv., and b) accuracy. For two reasons: 1) Loss of orthogonality in the Lanczos recursion 2) As in FOM, bumps and peaks in CG conv. hist.
Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR
For general square non-singular A Apply CG to normal equations (A Ax = A b) CGNE Apply CG to AA y = b (then x = A y) Graig s method Disadvantage. Search in K k (A A,...): If A = A then convergence is determined by A 2 : condition number squared,.... Expansion K k requires 2 MVs (i.e., many costly steps). For a discussion on Graig s method, see Exercise 8.1. For a Graig versus GCR, see Exercise 8.6.
For general square non-singular A Apply CG to normal equations (A Ax = A b) CGNE Apply CG to AA y = b (then x = A y) Graig s method Disadvantage. Search in K k (A A,...): If A = A then convergence is determined by A 2 : condition number squared,.... Expansion K k requires 2 MVs (i.e., many costly steps). [Faber Manteufel 90] Theorem. For general square non-singular A, there is no Krylov solver that finds the best solution in de Krylov subspace K k (A, r 0 ) using short recurrences.
For general square non-singular A Apply CG to normal equations (A Ax = A b) CGNE Apply CG to AA y = b (then x = A y) Graig s method Disadvantage. Search in K k (A A,...): If A = A then convergence is determined by A 2 : condition number squared,.... Expansion K k requires 2 MVs (i.e., many costly steps). [Faber Manteufel 90] Theorem. For general square non-singular A, there is no Krylov solver that finds the best solution in de Krylov subspace K k (A, r 0 ) using short recurrences. Alternative. Construct residuals in a sequence of shrinking spaces (orthogonal to a sequence of growing spaces): adapt the construction of CG.
Conjugate Gradients, A = A. K = I u k = r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, u k K k+1 (A, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (A, r 0 ) If r k, Au k r k 1, then r k, Au k K k (A, r 0 ) Proof. A u k = A r k β k A u k 1 r k 1 A u k = A r k β k A u k 1 K k 1 (A, r 0 ) by construction β k 1 by induction: Ar k K k 1 (A, r 0 ) r k AK k 1 (A, r 0 ) r k K k (A, r 0 )
Bi-Conjugate Gradients u k = r k β k u k 1 r k+1 = r k α k A u k Theorem. We have r k, u k K k+1 (A, r 0 ). Suppose r 0,..., r k 1 is a Krylov basis of K k (A, r 0 ). If r k, Au k r k 1, then r k, Au k K k (A, r 0 ). Proof. r k = r k 1 α k 1 A u k 1 r k 1 r k = r k 1 α k 1 A u k 1 K k 1 (A, r 0 ) by construction α k 1 by induction A u k = A r k β k A u k 1 r k 1 A u k = A r k β k A u k 1 K k 1 (A, r 0 ) by construction β k 1 by induction: Ar k K k 1 (A, r 0 ) r k K k (A, r 0 ) A K k 1 (A, r 0 )
Bi-Conjugate Gradients u k = r k β k u k 1 r k+1 = r k α k A u k r k, u k K k+1 (A, r 0 ), r k, Au k r k 1 With ρ k (r k, r k ) & σ k (Au k, r k ) and, since r k + ϑ k A r k 1 K k (A, r 0 ) for some ϑ k, is the complex conjugate we have that α k = (r k, r k ) (Au k, r k ) = ρ k σ k and β k = (Ar k, r k 1 ) (Au k 1, r k 1 ) = (r k, A r k 1 ) σ k 1 = ρ k ϑ k σ k 1
Bi-Conjugate Gradients u k = r k β k u k 1 r k+1 = r k α k A u k r k, u k K k+1 (A, r 0 ), r k, Au k r k 1 With ρ k (r k, q k (A ) r 0 ) & σ k (Au k, q k (A ) r 0 ) and, since q k (ζ) + ϑ k ζ q k 1 (ζ) P k 1 for some ϑ k, we have that α k = ρ k σ k & β k = ρ k ϑ k σ k 1
Bi-Conjugate Gradients Classical Bi-CG [Fletcher 76] generates the shadow residuals r k = q k (A ) r 0 with the same polynomal as r k (q k = p k ) r k = p k (A)r 0, r k = p k (A ) r 0 : i.e., compute r k+1 as r k+1 = r k ᾱ k A ũ k, with ũ k = r k β k ũ k 1. In particular, ϑ k = α k 1. However, other choices for q k are possible as well. Example. q k (ζ) = (1 ω k 1 ζ) q k 1 (ζ) (ζ C). Then, ϑ k = ω k 1 and r k = r k 1 ω k 1 A r k 1, with, for instance, ω k 1 to minimize r k 2. The next transparancy displays classical Bi-CG.
[Fletcher 76] Bi-CG x = 0, r = b. Choose r u = 0, ρ = 1 ũ = 0 While r > tol do σ = ρ, ρ = (r, r), β = ρ/σ u r β u, c = Au, ũ r β ũ, c = A ũ σ = (c, r), α = ρ/σ r r α c, r r ᾱ c x x + α u end while
[Fletcher 76] Bi-CG x = 0, r = b. Choose r u = 0, ρ = 1 c = 0 While r > tol do σ = ρ, ρ = (r, r), β = ρ/σ u r β u, c = Au, σ = (c, r), α = ρ/σ r r α c, x x + α u end while c A r β c r r ᾱ c
Selecting the initial shadow residual r 0. Often recommended: r 0 = r 0. Practical experience: select r 0 randomly (unless A = A). Exercise. Bi-CG and CG coincide if A is Hermitian and r 0 = r 0. Exercise. Derive a version of Bi-CG that includes a preconditioner K. Show that Bi-CG and CG coincide if A and K are Hermitian and r 0 = K 1 r 0. Exercise 8.9 gives an alternative derivation of Bi-CG.
Properties Bi-CG Pros Usually selects good approximations from the search subspaces (Krylov subspaces). Low costs per step: 2 DOT, 5 AXPY. Low storage: 7 large vectors. No knowledge on properties of A is needed. Cons Non-optimal Krylov subspace method. Not robust: Bi-CG may break down. Bi-CG is sensitive to evaluation errors (often loss of super-linear convergence). Convergence depends on shadow residual r 0. 2 MV needed to expand search subspace by 1 vector. 1 MV is by A.
Properties Bi-CG Pros Usually selects good approximations from the search subspaces (Krylov subspaces). Low costs per step: 2 DOT, 5 AXPY. Low storage: 8 large vectors. No knowledge on properties of A is needed. Cons Non-optimal Krylov subspace method. Not robust: Bi-CG may break down. Bi-CG is sensitive to evaluation errors (often loss of super-linear convergence). Convergence depends on shadow residual r 0. 2 MV needed to expand search subspace by 1 vector. 1 MV is by A.
Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR
Bi-Lanczos Find coefficients α k, β k, α k and β k such that (bi-orthogonalize) γ k v k+1 = Av k α k v k β k v k 1... w k, w k 1,... γ k w k+1 = A w k α k w k β k w k 1... v k, v k 1,... Select appropriate scaling coefficients γ k and γ k. Then AV k = V k+1 H k with H k Hessenberg A W k = W k+1 H k with H k Hessenberg and W k+1 V k+1 = D k+1 diagonal Exercise. T k W k AV k = D k H k = H k Dk is tridiagonal. Exploit H k = D k Hk D k Bi-Lanczos. and tridiagonal structure:
Bi-Lanczos Find coefficients α k, β k, α k and β k such that (bi-orthogonalize) γ k v k+1 = Av k α k v k β k v k 1... w k, w k 1,... γ k w k+1 = A w k α k w k β k w k 1... v k, v k 1,... Select appropriate scaling coefficients γ k and γ k. Then AV k = V k+1 H k with H k Hessenberg A W k = W k+1 H k with H k Hessenberg and W k+1 V k+1 = D k+1 diagonal Exercise. T k W k AV k = D k H k = H k Dk is tridiagonal. Exploit H k = D k Hk D k Bi-Lanczos. and tridiagonal structure: See Exercise 8.7 for details.
[Lanczos 50] Lanczos ρ = r 0, v 1 = r 0 /ρ β 0 = 0, v 0 = 0 for k = 1, 2,... do ṽ = A v k β k 1 v k 1 α k = v k ṽ, β k = ṽ, ṽ ṽ α k v k v k+1 = ṽ/β k end while
[Lanczos 50] Bi-Lanczos Select a r 0, and a r 0 v 1 =r 0 / r 0, v 0 =0, w 1 = r 0 / r 0, w 0 =0 γ 0 = 0, δ 0 = 1, γ 0 = 0, δ0 = 1 For k = 1, 2,... do δ k = w k v k, ṽ = A v k, β k = γ k 1 δ k /δ k 1, ṽ ṽ β k v k 1, α k = w kṽ/δ k, ṽ ṽ α k v k, w = A w k βk = γ k 1 δ k /δ k 1 w w βk w k 1 α k = α k w w α k w k Select a γ k 0 and a γ k 0 v k+1 = ṽ/γ k, w k+1 = w/ γ k, V k = [ V k 1, v k ], W k = [ W k 1, w k ] end while
Arnoldi: AV k = V k+1 H k. If A = A, then T k H k tridiagonal Lanczos Lanczos + T = LU + efficient implementation CG Bi-Lanczos + T = LU + efficient implementation Bi-CG
[Fletcher 76] Bi-CG x = 0, r = b. Choose r u = 0, ρ = 1 c = 0 While r > tol do σ = ρ, ρ = (r, r), β = ρ/σ u r β u, c = Au, σ = (c, r), α = ρ/σ r r α c, x x + α u end while c A r β c r r ᾱ c
Bi-CG may break down 0) Lucky breakdown if r k = 0. 1) Pivot breakdown or LU-breakdown, i.e., LU-decomposition may not exist. Corresponds to σ = 0 in Bi-CG Remedy. Composite step Bi-CG (skip once forming T k = L k U k ) Form T = QR as in MINRES (from the beginning): simple Quasi Minimal Residuals 2) Bi-Lanczos may break down, i.e., a diagonal element of D k may be zero. Corresponds to ρ = 0 in Bi-CG Remedy. Look ahead General remedy. Restart Look ahead in QMR
Note. CG may suffer from pivot breakdown when applied to a Hermitian, non definite matrix (A = A with positive as well as negative eigenvalues): MINRES and SYMMLQ cure this breakdown. Note. Exact breakdowns are rare. However, near breakdowns lead to irregular convergence and instabilities. This leads to loss of speed of convergence loss of accuracy
Properties Bi-CG Advantages Usually selects good approximations from the search subspaces (Krylov subspaces). 2 DOT, 5 AXPY per step. Storage: 8 large vectors. No knowledge on properties of A is needed. Drawbacks Non-optimal Krylov subspace method. Not robust: Bi-CG may break down. Bi-CG is sensitive to evaluation errors (often loss of super-linear convergence). Convergence depends on shadow residual r 0. 2 MV needed to expand search subspace. 1 MV is by A.
Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR
Bi-Conjugate Gradients, K=I u k = r k β k u k 1 r k+1 = r k α k A u k r k, u k K k+1 (A, r 0 ), r k, Au k r k 1 With ρ k (r k, q k (A ) r 0 ) & σ k (Au k, q k (A ) r 0 ) and, since q k (ζ) + ϑ k ζ q k 1 (ζ) P k 1 for some ϑ k, we have that α k = ρ k σ k & β k = ρ k ϑ k σ k 1
Transpose-free Bi-CG [Sonneveld 89] ρ k = (r k, q k (A ) r 0 ) = (q k (A)r k, r 0 ), σ k = (Au k, q k (A ) r 0 ) = (Aq k (A)u k, r 0 ) Q k q k (A) (Bi-CG) { ρk, Q k u k = Q k r k β k Q k u k 1, σ k, Q k r k+1 = Q k r k α k AQ k u k, (Pol) Compute q k+1 of degree k + 1 s.t. q k+1 (0) = 1. Compute Q k+1 u k, Q k+1 r k+1 (from Q k u k, Q k r k+1,...
Transpose-free Bi-CG [Sonneveld 89] ρ k = (r k, q k (A ) r 0 ) = (q k (A)r k, r 0 ), σ k = (Au k, q k (A ) r 0 ) = (Aq k (A)u k, r 0 ) Q k q k (A) (Bi-CG) { ρk, Q k u k = Q k r k β k Q k u k 1, σ k, Q k r k+1 = Q k r k α k AQ k u k, (Pol) Compute q k+1 of degree k + 1 s.t. q k+1 (0) = 1. Compute Q k+1 u k, Q k+1 r k+1 Example. q k+1 (ζ) = (1 ω k ζ)q k (ζ)
Transpose-free Bi-CG [Sonneveld 89] ρ k = (r k, q k (A ) r 0 ) = (q k (A)r k, r 0 ), σ k = (Au k, q k (A ) r 0 ) = (Aq k (A)u k, r 0 ) Q k q k (A) (Bi-CG) { ρk, Q k u k = Q k r k β k Q k u k 1, σ k, Q k r k+1 = Q k r k α k AQ k u k, (Pol) Compute q k+1 of degree k + 1 s.t. q k+1 (0) = 1. Compute Q k+1 u k, Q k+1 r k+1 Example. q k+1 (ζ) = (1 ω k ζ)q k (ζ) (ζ C). { ωk, Q k+1 u k = Q k u k ω k AQ k u k, Q k+1 r k+1 = Q k r k+1 ω k AQ k r k+1,
Transpose-free Bi-CG; Practice Work with u k Q ku BiCG k and r k Q kr BiCG k+1 Write u k 1 and r k, instead of Q k u BiCG k 1 ρ k = (r k, r 0 ), σ k = (Au k, r 0 ) and Q kr BiCG k, resp. (Bi-CG) { ρk = (r k, r 0 ), u k = r k β k u k 1, σ k = (Au k, r 0 ), r k = r k α k Au k, x k = x k + α k u k (Pol) Compute updating coefficients for q k+1. Compute u k, r k+1, x k+1 Example. { ωk, u k+1 = u k ω k Au k, r k+1 = r k ω k Ar k, x k+1 = x k + ω kr k
Example. q k+1 (ζ) = (1 ω k ζ)q k (ζ) (ζ C). How to choose ω k? Bi-CGSTABilized. With s k Ar k, ω k argmin ω r k ω Ar k 2 = s k r k s k s k as in Local Minimal Residual method, or, equivalently, as in GCR(1).
BiCGSTAB [van der Vorst 92] x = 0, r = b. u = 0, ω = σ = 1. Choose r While r > tol do σ ωσ, ρ = (r, r), β = ρ/σ u r β u, c = Au σ = (c, r), α = ρ/σ r r α c, x x + α u s = Ar, ω = (r, s)/(s, s) u u ω c x x + ω r r r ω s end while
Hybrid Bi-CG or product type Bi-CG r k q k (A)r Bi-CG k = q k (A) p BiCG k (A) r 0 p BiCG k is the kth Bi-CG residual polynomial How to select q k?? q k for efficient steps & fast convergence. Fast convergence by reducing the residual stabilizing the Bi-CG part other when used as linear solver for the Jacobian system in a Newton scheme for non-linear equations, by reducing the number of Newton steps
Hybrid Bi-CG Examples. CGS Bi-CG Bi-CG Sonneveld [1989] Bi-CGSTAB GCR(1) Bi-CG van der Vorst [1992] GPBi-CG 2-truncated GCR Bi-CG Zhang [1997] BiCGstab(l) GCR(l) Bi-CG Sl. Fokkema [1993] For more details on hybrid Bi-CG, see Exercise 8.11 and Exercise 8.12. For a derivation of GPBi-CG, see Exercise 8.13.
Properties hybrid Bi-CG Pros Converges often twice as fast as Bi-CG w.r.t. # MVs: each MV expands the search subspace Bi-CG: x k x 0 K k (A; r 0 ) à 2k MV. Hybrid Bi-CG: x k x 0 K 2k (A; r 0 ) à 2k MV. Work/MV and storage similar to Bi-CG. Transpose free. Explicit computation of Bi-CG scalars. Cons Non-optimal Krylov subspace method. Peaks in the convergence history. Large intermediate residuals. Breakdown possibilities.
Conjugate Gradients Squared [Sonneveld 89] r k = p BiCG k (A) p BiCG k (A) r 0 CGS exploits recurrence relations for the Bi-CG polynomials to design a very efficient algorithm. Properties + Hybrid Bi-CG. + A very efficient algorithm: 1 DOT/MV, 3.25 AXPY/MV; storage: 7 large vectors. Often high peaks in its convergence history Often large intermediate residuals + Seems to do well as linear solver in a Newton scheme
[Sonneveld 89] Conjugate Gradients Squared x = 0, r = b. u = w = 0, ρ = 1. Choose r. While r > tol do σ = ρ, ρ = (r, r), β = ρ/σ w u β w v = r β u w v β w, c = Aw σ = (c, r), α = ρ/σ u = v α c r r α A(v + u) x x + α (v + u) end while
Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR
Properties Bi-CGSTAB Pros Hybrid Bi-CG. Converges faster (& smoother) than CGS. More accurate than CGS. 2 DOT/MV, 3 AXPY/MV. Storage: 6 large vectors. Cons Danger of (A) Lanczos breakdown (ρ k = 0), (B) pivot breakdown (σ k = 0), (C) breakdown minimization (ω k = 0).
1.0e+00 1.0e-01 Bi-CG BiCGSTAB GCR(100) 100-truncated GCR 1.0e-02 Log10 of residual norm 1.0e-03 1.0e-04 1.0e-05 1.0e-06 1.0e-07 1.0e-08 1.0e-09 0 50 100 150 200 250 300 Mv (a u x ) x (a u y ) y = 1 on [0, 1] [0, 1]. a = 1000 for 0.1 x, y 0.9 and a = 1 elsewhere. Dirichlet BC on y = 0, Neumann BC on other parts of Boundary. 82 82 volumes. ILU Decomp.
1 u = 0 u = 1 F = 100 u = 1 A = 10e4 A = 10e-5 A = 10e2 0 u = 1 1 (a u x ) x (a u y ) y + b u x = f on [0, 1] [0, 1]. definition of a = A and of f = F ; F = 0 except...
4 -.- Bi-CG, -- Bi-CGSTAB, - BiCGstab(2) 2 log10 of residual norm 0-2 -4-6 -8 0 50 100 150 200 250 300 350 400 450 500 number of matrix multiplications (a u x ) x (a u y ) y + bu x = f on [0, 1] [0, 1]. b(x, y) = 2 exp(2(x 2 + y 2 )), a changes strongly Dirichlet BC. 129 129 volumes. ILU Decomp.
4 -.- Bi-CG, -- Bi-CGSTAB, - BiCGstab(2) 2 log10 of residual norm 0-2 -4-6 -8 0 100 200 300 400 500 600 700 800 900 1000 number of matrix multiplications (a u x ) x (a u y ) y + bu x = f on [0, 1] [0, 1]. b(x, y) = 2 exp(2(x 2 + y 2 )), a changes strongly Dirichlet BC. 201 201 volumes. ILU Decomp.
Breakdown of the minimization Exact arithmetic, ω k = 0: No reduction of residual by Q k+1 r k+1 = (I ω k A ) Q k r BiCG k+1. ( ) q k+1 is of degree k: Bi-CG scalars can not be computed; breakdown of incorporated Bi-CG. Finite precision arithmetic, ω k 0: Poor reduction of residual by ( ) Bi-CG scalars are seriously affected by evaluation errors: drop of speed of convergence. ω k 0 to be expected if A is real and A has eigenvalues with rel. large imaginary part: ω k is real!
Example. q k+1 (ζ) = (1 ω k ζ)q k (ζ) (ζ C). How to choose ω k? Bi-CGSTABilized. With s k Ar k, ω k argmin ω r k ω Ar k 2 = s k r k s k s k as in Local Minimal Residual method, or, equivalently, as in GCR(1). BiCGstab(l). Cycle l times through the Bi-CG part to compute A j u, A j r for j = 0,..., l, where now u Q k u BiCG k+l 1 and r Q k r BiCG k+l for k = ml. γ m argmin γ r [Ar,..., A l r ] γ 2
r k+l = r [Ar,..., A l r ] γ m q k+l (ζ) = (1 [ζ,..., ζ l ] γ m )q k (ζ) (ζ C)
BiCGstab(l) for l 2 [Sl Fokkema 93, Sl vdv Fokkema 94] { qk+1 (A) = A q k (A) k ml q ml+l (A) = ϕ m (A) q ml (A) where ϕ m of exact degree l, ϕ m (0) = 1 and k = ml ϕ m minimizes ϕ m (A) q ml (A) r BiCG }{{ ml+l } 2. r ϕ m is a GCR residual polynomial of degree l. Note that real polynomials of degree 2 can have complex zeros.
BiCGstab(l) for l 2 [Sl Fokkema 93, Sl vdv Fokkema 94] { qk+1 (A) = A q k (A) k ml q ml+l (A) = ϕ m (A) q ml (A) where ϕ m of exact degree l, ϕ m (0) = 1 and k = ml ϕ m minimizes ϕ m (A) q ml (A) r BiCG }{{ ml+l } 2. r Minimization in practice: p m (ζ) = 1 l j=1 γ (m) j ζ j (γ (m) j ) argmin (γj ) r l j=1 γ j A j r 2, Compute Ar, A 2 r,..., A l r explicitly. With R [ Ar,..., A l r ], γ m (γ (m) 1,..., γ (m) l ) T we have [Normal Equations, use Choleski] (R R) γ m = R r
BiCGstab(l) x = 0, r = [ b ]. Choose r. u = [ 0 ], γ l = σ = 1. While r > tol σ γ l σ do For j = 1 to l do ρ = (r j, r), β = ρ/σ u r β u, u [ u, Au j ] σ = (u j+1, r), α = ρ/σ r r α u 2:j+1, r [ r, Ar j ] x x + α u 1 end for R r 2:l+1. Solve (R R) γ = R r 1 for γ u [ u 1 (γ 1 u 2 +... + γ l u l+1 ) ] r [ r 1 (γ 1 r 2 +... + γ l r l+1 ) ] x x + (γ 1 r 1 +... + γ l r l ) end while
epsilon = 10^(-16); ell = 4; x = zeros(b); rt = rand(b); sigma = 1; omega = 1; u = zeros(b); y = MV(x); r = b-y; norm = r *r; nepsilon = norm*epsilon^2; L = 2:ell+1; while norm > nepsilon sigma = -omega*sigma; y = r; for j = 1:ell rho = rt *y; beta = rho/sigma; u = r-beta*u; y = MV(u(:,j)); u(:,j+1) = y; sigma = rt *y; alpha = rho/sigma; r = r-alpha*u(:,2:j+1); x = x+alpha*u(:,1); y = MV(r(:,j)); r(:,j+1) = y; end G = r *r; gamma = G(L,L)\G(L,1); omega = gamma(ell); u = u*[1;-gamma]; r = r*[1;-gamma]; x = x+r*[gamma;0]; norm = r *r; end
2 -.- Bi-CG, -- Bi-CGSTAB, - BiCGstab(2) 0 log10 of residual norm -2-4 -6-8 -10 0 100 200 300 400 500 600 number of matrix multiplications u xx + u yy + u zz + 1000u x = f. f s.t. u(x, y, z) = exp(xyz) sin(πx) sin(πy) sin(πz). (52 52 52) volumes. No preconditioning.
-. BiCGStab2, : Bi-CGSTAB, -- BiCGstab(2), - BiCGstab(4) 4 2 log10 of residual norm 0-2 -4-6 -8 0 100 200 300 400 500 600 number of matrix multiplications (a u x ) x (a u y ) y = 1 on [0, 1] [0, 1]. a = 1000 for 0.1 x, y 0.9 and a = 1 elsewhere. Dirichlet BC on y = 0, Neumann BC on other parts of Boundary. 200 200 volumes. ILU Decomp.
-. BiCGStab2, : Bi-CGSTAB, -- BiCGstab(2), - BiCGstab(4) 0-1 log10 of residual norm -2-3 -4-5 -6-7 -8 0 200 400 600 800 1000 number of matrix multiplications ϵ(u xx + u yy ) + a(x, y)u x + b(x, y)u y = 0 on [0, 1] [0, 1], Dirichlet BC ϵ = 10 1, a(x, y) = 4x(x 1)(1 2y), b(x, y) = 4y(1 y)(1 2x), u(x, y) = sin(πx) + sin(13πx) + sin(πy) + sin(13πy) (201 201) volumes, no preconditioning.
2 0-2 -4-6 -8-10 -12-14 -16 0 50 100 150 200 250 300 u xx + u yy + u zz + 1000 u x = f. f is defined by the solution u(x, y, z) = exp(xyz) sin(πx) sin(πy) sin(πz). (10 10 10) volumes. No preconditioning.
ρ k = (r k, r 0 ), ρ k = ρ k(1 + ϵ) Accurate Bi-CG coefficients ϵ n ξ r k 2 r 0 2 (r k, r 0 ) = n ξ ρ k where ρ k (r k, r 0 ) r k 2 r 0 2 2 0-2 -4-6 -8-10 -12-14 -16-18 0 50 100 150 200 250 300
Why using pol. factors of degree 2? Hybrid Bi-CG, that is faster than Bi-CGSTAB 1 sweep BiCGstab(l) versus l steps Bi-CGSTAB: Reduction with MR-polynomial of degree l is better than l MR-pol. of degr. 1. MR-polynomial of degree l contributes only once to an increase of ρ k Why not? Efficiency: 1.75 + 0.25 l DOT/MV, 2.5 + 0.5 l AXPY/MV Storage: 2l + 5 large vector. Loss of accuracy: rk b Ax k (... + c ξ max γi A A i 1 r ) break-downs are possible
Properties Bi-CG Advantages Usually selects good approximations from the search subspaces (Krylov subspaces). 2 DOT, 5 AXPY per step. Storage: 8 large vectors. No knowledge on properties of A is needed. Drawbacks Non-optimal Krylov subspace method. Not robust: Bi-CG may break down. Bi-CG is sensitive to evaluation errors (often loss of super-linear convergence). Convergence depends on shadow residual r 0. 2 MV needed to expand search subspace. 1 MV is by A.
Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR
Hybrid Bi-CG Notation. If p k is a polynomial of exact degree k, r 0 n-vector, let S(p k, A, r 0 ) {p k (A)v v K k (A, r 0 )} Theorem. Hybrid Bi-CG find residuals r k S(p k, A, r 0 ). Example. Bi-CGSTAB: where, in every step, p k (λ) = (1 ω k λ) p k 1 (λ) ω k = minarg ω r ωar 2, where r = p k 1 (A)v, v = r Bi-CG k
Hybrid Bi-CG Notation. If p k is a polynomial of exact degree k, r 0 n-vector, let S(p k, A, r 0 ) {p k (A)v v K k (A, r 0 )} Theorem. Hybrid Bi-CG find residuals r k S(p k, A, r 0 ). Example. BiCGstab(l): p k (λ) = (1 ω k λ) p k 1 (λ) where, every lth step γ = minarg γ r [Ar,..., A l r] γ 2, where r = p k l (A)r Bi-CG k. (1 γ 1 λ... γ l λ l ) = (1 ω k λ)... (1 ω k l λ)
Induced Dimension Reduction Definition. If p k is a polynomial of exact degree k, R R 0 = [ r 1,..., r s ] an n s matrix, then S(p k, A, R) { p k (A)v v K k (A, R) }, is the p k -Sonneveld subspace. Here K k (A, R) k 1 j=0 (A ) j R γ j γ j C s. Theorem. IDR find residuals r k S(p k, A, R). Example. Bi-CGSTAB: where, in every step, p k (λ) = (1 ω k λ) p k 1 (λ) ω k = minarg ω r ωar 2, where r = p k 1 (A)v, v = r Bi-CG k
[van Gijzen Sonneveld 07] IDR Select an x 0. Select n s matrices U and R. Compute C AU. x = x 0, r b Ax, j = s, i = 1 while r > tol do Solve R C γ = R r for γ v = r C γ, s = Av j++, if j > s, ω = s v/s s, j = 0 Ue i U γ + ωv, x = x + Ue i r 0 = r, r = v ωs, Ce i = r 0 r i++, if i > s, i = 1 end while
Select n l matricex U and R Experiments suggest R = qr(rand(n, l), 0) U and C can be constructed from l steps of GCR. We will discuss IDR in more detail in Lecture 11. See also Exercise 11.1 Exercise 11.5.