Fast iterative solvers - PDF Free Download

Utrecht, 15 november 2017 Fast iterative solvers Gerard Sleijpen Department of Mathematics http://www.staff.science.uu.nl/ sleij101/

Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a general square matrix A. Arnoldi decomposition: AV k = V k+1 H k with V k orthonormal, H k Hessenberg. H k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. FOM: r k V k H k y k = e 1 x k = V k (Uk 1 k e 1)) GMRES: y k = argmin y b AV k y 2 = argmin y e 1 H k y 2 x k = V k (R 1 k (Q k e 1 )) The columns of V k span K k (A, r 0 ) {p(a)r 0 p P k 1 }. K k (A, r 0 ) = span(r 0, r 1,..., r k 1 ).

Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) Note. Short recurrences (eff. comp.) because of T k tri-diagonal, and U k U k = V k (CG) and W k R k = V k (MINRES)

Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 Note. Less stable because x k = (V k R 1 k )(Q k e 1 ) we rely on math. for orth. of V k (CG & MINRES) W k R k = V k (MINRES) is a three term vector recurrence for the w k s.

Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) Note. If A is positive definite, then CG minimizes as well: y k = argmin y x V k y A = argmin y b AV k y A 1

Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) SYMMLQ. Take x k = AV k y k with y k argmin y x AV k y 2 e 1 T k T k y k = 0 x k = V k+1 T k (T k T k ) 1 e 1 = (V k+1 Q k )((R k ) 1 e 1 )

FOM residual polynomials and Ritz values Property. r CG k = r FOM k = p k (A)r 0 V k, for some residual polynomial p k of degree k, i.e., p k (0) = 1. p FOM k = p k is the kth (CG or) FOM residual polynomial. Theorem. p FOM (ϑ) = 0 H k y = ϑ y for some y 0: the zeros of p FOM k In particular, are precisely the kth order Ritz values. ) (1 λϑj p FOM k (λ) = k j=1 (λ C). Moreover, for a polynomial p of degree at most k with p(0) = 1, we have that that Proof. Exercise 6.6. p = p FOM k p(h k ) = 0.

Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues

with A n n non-singular. Ax = b Today s topic. Iterative methods for general systems using short recurrences

Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

A = A > 0, Conjugate Gradient [Hestenes Stiefel 52] x = 0, r = b, u = 0, ρ = 1 While r > tol do σ = ρ, ρ = r r, β = ρ/σ u r β u, c = Au σ = u c, α = ρ/σ r r α c x x + α u end while

Construction CG. There are four alternative derivations of CG. GCR (use A = A) CR use A 1 inner product + efficient implementation. Lanczos + T = LU + efficient implementation. Orthogonalize residuals. Nonlinear CG to solve x = argmin x 1 2 b A x 2 A 1... [Exercise 7.3]

Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k

Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, Ku k K k+1 (AK 1, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (AK 1, r 0 ) If r k, Au k K 1 r k 1, then r k, Au k K 1 K k (AK 1, r 0 )

Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, Ku k K k+1 (AK 1, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (AK 1, r 0 ) If r k, Au k K 1 r k 1, then r k, Au k K 1 K k (AK 1, r 0 ) Proof. r k = r k 1 α k 1 A u k 1 K 1 r k 1 r k = r k 1 α k 1 A u k 1 K 1 K k 1 (AK 1, r 0 ) by construction α k 1 by induction

Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, Ku k K k+1 (AK 1, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (AK 1, r 0 ) If r k, Au k K 1 r k 1, then r k, Au k K 1 K k (AK 1, r 0 ) Proof. A u k = AK 1 r k β k A u k 1 K 1 r k 1 A u k = AK 1 r k β k A u k 1 K 1 K k 1 (AK 1, r 0 ) by construction β k by induction: AK 1 r k K 1 K k 1 (AK 1, r 0 ) r k K 1 AK 1 K k 1 (AK 1, r 0 ) r k K 1 K k (AK 1, r 0 )

A = A & K = K: Preconditioned CG x = 0, r = b, u = 0, ρ = 1 While r > tol do Solve Kc = r for c σ = ρ, ρ = c r, β = ρ/σ u c β u, c = Au σ u c, α = ρ/σ r r α c x x + α u end while

Properties CG Pros Low costs per step: 1 MV, 2 DOT, 3 AXPY to increase dimension Krylov subspace by one. Low storage: 5 large vectors (incl. b). Minimal res. method if A, K pos. def.: r k A 1 is min. Orthogonal residual method if A = A, K = K: r k K 1 K k (AK 1 ; r 0 ). No additional knowledge on properties of A is needed. Robust: CG always converges if A, K pos. def.. Cons May break down if A = A 0. Does not work if A A. CG is sensitive to evaluation errors if A = A 0. Often loss of a) super-linear conv., and b) accuracy. For two reasons: 1) Loss of orthogonality in the Lanczos recursion 2) As in FOM, bumps and peaks in CG conv. hist.

Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

For general square non-singular A Apply CG to normal equations (A Ax = A b) CGNE Apply CG to AA y = b (then x = A y) Graig s method Disadvantage. Search in K k (A A,...): If A = A then convergence is determined by A 2 : condition number squared,.... Expansion K k requires 2 MVs (i.e., many costly steps). [Faber Manteufel 90] Theorem. For general square non-singular A, there is no Krylov solver that finds the best solution in de Krylov subspace K k (A, r 0 ) using short recurrences.

Conjugate Gradients, A = A. K = I u k = r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, u k K k+1 (A, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (A, r 0 ) If r k, Au k r k 1, then r k, Au k K k (A, r 0 ) Proof. A u k = A r k β k A u k 1 r k 1 A u k = A r k β k A u k 1 K k 1 (A, r 0 ) by construction β k 1 by induction: Ar k K k 1 (A, r 0 ) r k AK k 1 (A, r 0 ) r k K k (A, r 0 )

Bi-Conjugate Gradients u k = r k β k u k 1 r k+1 = r k α k A u k Theorem. We have r k, u k K k+1 (A, r 0 ). Suppose r 0,..., r k 1 is a Krylov basis of K k (A, r 0 ). If r k, Au k r k 1, then r k, Au k K k (A, r 0 ). Proof. r k = r k 1 α k 1 A u k 1 r k 1 r k = r k 1 α k 1 A u k 1 K k 1 (A, r 0 ) by construction α k 1 by induction A u k = A r k β k A u k 1 r k 1 A u k = A r k β k A u k 1 K k 1 (A, r 0 ) by construction β k 1 by induction: Ar k K k 1 (A, r 0 ) r k K k (A, r 0 ) A K k 1 (A, r 0 )

Bi-Conjugate Gradients u k = r k β k u k 1 r k+1 = r k α k A u k r k, u k K k+1 (A, r 0 ), r k, Au k r k 1 With ρ k (r k, r k ) & σ k (Au k, r k ) and, since r k + ϑ k A r k 1 K k (A, r 0 ) for some ϑ k, is the complex conjugate we have that α k = (r k, r k ) (Au k, r k ) = ρ k σ k and β k = (Ar k, r k 1 ) (Au k 1, r k 1 ) = (r k, A r k 1 ) σ k 1 = ρ k ϑ k σ k 1

Bi-Conjugate Gradients u k = r k β k u k 1 r k+1 = r k α k A u k r k, u k K k+1 (A, r 0 ), r k, Au k r k 1 With ρ k (r k, q k (A ) r 0 ) & σ k (Au k, q k (A ) r 0 ) and, since q k (ζ) + ϑ k ζ q k 1 (ζ) P k 1 for some ϑ k, we have that α k = ρ k σ k & β k = ρ k ϑ k σ k 1

Bi-Conjugate Gradients Classical Bi-CG [Fletcher 76] generates the shadow residuals r k = q k (A ) r 0 with the same polynomal as r k (q k = p k ) r k = p k (A)r 0, r k = p k (A ) r 0 : i.e., compute r k+1 as r k+1 = r k ᾱ k A ũ k, with ũ k = r k β k ũ k 1. In particular, ϑ k = α k 1. However, other choices for q k are possible as well. Example. q k (ζ) = (1 ω k 1 ζ) q k 1 (ζ) (ζ C). Then, ϑ k = ω k 1 and r k = r k 1 ω k 1 A r k 1, with, for instance, ω k 1 to minimize r k 2. The next transparancy displays classical Bi-CG.

[Fletcher 76] Bi-CG x = 0, r = b. Choose r u = 0, ρ = 1 ũ = 0 While r > tol do σ = ρ, ρ = (r, r), β = ρ/σ u r β u, c = Au, ũ r β ũ, c = A ũ σ = (c, r), α = ρ/σ r r α c, r r ᾱ c x x + α u end while

[Fletcher 76] Bi-CG x = 0, r = b. Choose r u = 0, ρ = 1 c = 0 While r > tol do σ = ρ, ρ = (r, r), β = ρ/σ u r β u, c = Au, σ = (c, r), α = ρ/σ r r α c, x x + α u end while c A r β c r r ᾱ c

Selecting the initial shadow residual r 0. Often recommended: r 0 = r 0. Practical experience: select r 0 randomly (unless A = A). Exercise. Bi-CG and CG coincide if A is Hermitian and r 0 = r 0. Exercise. Derive a version of Bi-CG that includes a preconditioner K. Show that Bi-CG and CG coincide if A and K are Hermitian and r 0 = K 1 r 0. Exercise 8.9 gives an alternative derivation of Bi-CG.

Properties Bi-CG Pros Usually selects good approximations from the search subspaces (Krylov subspaces). Low costs per step: 2 DOT, 5 AXPY. Low storage: 7 large vectors. No knowledge on properties of A is needed. Cons Non-optimal Krylov subspace method. Not robust: Bi-CG may break down. Bi-CG is sensitive to evaluation errors (often loss of super-linear convergence). Convergence depends on shadow residual r 0. 2 MV needed to expand search subspace by 1 vector. 1 MV is by A.

Properties Bi-CG Pros Usually selects good approximations from the search subspaces (Krylov subspaces). Low costs per step: 2 DOT, 5 AXPY. Low storage: 8 large vectors. No knowledge on properties of A is needed. Cons Non-optimal Krylov subspace method. Not robust: Bi-CG may break down. Bi-CG is sensitive to evaluation errors (often loss of super-linear convergence). Convergence depends on shadow residual r 0. 2 MV needed to expand search subspace by 1 vector. 1 MV is by A.

Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

Bi-Lanczos Find coefficients α k, β k, α k and β k such that (bi-orthogonalize) γ k v k+1 = Av k α k v k β k v k 1... w k, w k 1,... γ k w k+1 = A w k α k w k β k w k 1... v k, v k 1,... Select appropriate scaling coefficients γ k and γ k. Then AV k = V k+1 H k with H k Hessenberg A W k = W k+1 H k with H k Hessenberg and W k+1 V k+1 = D k+1 diagonal Exercise. T k W k AV k = D k H k = H k Dk is tridiagonal. Exploit H k = D k Hk D k Bi-Lanczos. and tridiagonal structure:

[Lanczos 50] Lanczos ρ = r 0, v 1 = r 0 /ρ β 0 = 0, v 0 = 0 for k = 1, 2,... do ṽ = A v k β k 1 v k 1 α k = v k ṽ, β k = ṽ, ṽ ṽ α k v k v k+1 = ṽ/β k end while

[Lanczos 50] Bi-Lanczos Select a r 0, and a r 0 v 1 =r 0 / r 0, v 0 =0, w 1 = r 0 / r 0, w 0 =0 γ 0 = 0, δ 0 = 1, γ 0 = 0, δ0 = 1 For k = 1, 2,... do δ k = w k v k, ṽ = A v k, β k = γ k 1 δ k /δ k 1, ṽ ṽ β k v k 1, α k = w kṽ/δ k, ṽ ṽ α k v k, w = A w k βk = γ k 1 δ k /δ k 1 w w βk w k 1 α k = α k w w α k w k Select a γ k 0 and a γ k 0 v k+1 = ṽ/γ k, w k+1 = w/ γ k, V k = [ V k 1, v k ], W k = [ W k 1, w k ] end while

Arnoldi: AV k = V k+1 H k. If A = A, then T k H k tridiagonal Lanczos Lanczos + T = LU + efficient implementation CG Bi-Lanczos + T = LU + efficient implementation Bi-CG

Bi-CG may break down 0) Lucky breakdown if r k = 0. 1) Pivot breakdown or LU-breakdown, i.e., LU-decomposition may not exist. Corresponds to σ = 0 in Bi-CG Remedy. Composite step Bi-CG (skip once forming T k = L k U k ) Form T = QR as in MINRES (from the beginning): simple Quasi Minimal Residuals 2) Bi-Lanczos may break down, i.e., a diagonal element of D k may be zero. Corresponds to ρ = 0 in Bi-CG Remedy. Look ahead General remedy. Restart Look ahead in QMR

Note. CG may suffer from pivot breakdown when applied to a Hermitian, non definite matrix (A = A with positive as well as negative eigenvalues): MINRES and SYMMLQ cure this breakdown. Note. Exact breakdowns are rare. However, near breakdowns lead to irregular convergence and instabilities. This leads to loss of speed of convergence loss of accuracy

Properties Bi-CG Advantages Usually selects good approximations from the search subspaces (Krylov subspaces). 2 DOT, 5 AXPY per step. Storage: 8 large vectors. No knowledge on properties of A is needed. Drawbacks Non-optimal Krylov subspace method. Not robust: Bi-CG may break down. Bi-CG is sensitive to evaluation errors (often loss of super-linear convergence). Convergence depends on shadow residual r 0. 2 MV needed to expand search subspace. 1 MV is by A.

Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

Bi-Conjugate Gradients, K=I u k = r k β k u k 1 r k+1 = r k α k A u k r k, u k K k+1 (A, r 0 ), r k, Au k r k 1 With ρ k (r k, q k (A ) r 0 ) & σ k (Au k, q k (A ) r 0 ) and, since q k (ζ) + ϑ k ζ q k 1 (ζ) P k 1 for some ϑ k, we have that α k = ρ k σ k & β k = ρ k ϑ k σ k 1

Transpose-free Bi-CG [Sonneveld 89] ρ k = (r k, q k (A ) r 0 ) = (q k (A)r k, r 0 ), σ k = (Au k, q k (A ) r 0 ) = (Aq k (A)u k, r 0 ) Q k q k (A) (Bi-CG) { ρk, Q k u k = Q k r k β k Q k u k 1, σ k, Q k r k+1 = Q k r k α k AQ k u k, (Pol) Compute q k+1 of degree k + 1 s.t. q k+1 (0) = 1. Compute Q k+1 u k, Q k+1 r k+1 (from Q k u k, Q k r k+1,...

Transpose-free Bi-CG; Practice Work with u k Q ku BiCG k and r k Q kr BiCG k+1 Write u k 1 and r k, instead of Q k u BiCG k 1 ρ k = (r k, r 0 ), σ k = (Au k, r 0 ) and Q kr BiCG k, resp. (Bi-CG) { ρk = (r k, r 0 ), u k = r k β k u k 1, σ k = (Au k, r 0 ), r k = r k α k Au k, x k = x k + α k u k (Pol) Compute updating coefficients for q k+1. Compute u k, r k+1, x k+1 Example. { ωk, u k+1 = u k ω k Au k, r k+1 = r k ω k Ar k, x k+1 = x k + ω kr k

BiCGSTAB [van der Vorst 92] x = 0, r = b. u = 0, ω = σ = 1. Choose r While r > tol do σ ωσ, ρ = (r, r), β = ρ/σ u r β u, c = Au σ = (c, r), α = ρ/σ r r α c, x x + α u s = Ar, ω = (r, s)/(s, s) u u ω c x x + ω r r r ω s end while

Hybrid Bi-CG or product type Bi-CG r k q k (A)r Bi-CG k = q k (A) p BiCG k (A) r 0 p BiCG k is the kth Bi-CG residual polynomial How to select q k?? q k for efficient steps & fast convergence. Fast convergence by reducing the residual stabilizing the Bi-CG part other when used as linear solver for the Jacobian system in a Newton scheme for non-linear equations, by reducing the number of Newton steps

Hybrid Bi-CG Examples. CGS Bi-CG Bi-CG Sonneveld [1989] Bi-CGSTAB GCR(1) Bi-CG van der Vorst [1992] GPBi-CG 2-truncated GCR Bi-CG Zhang [1997] BiCGstab(l) GCR(l) Bi-CG Sl. Fokkema [1993] For more details on hybrid Bi-CG, see Exercise 8.11 and Exercise 8.12. For a derivation of GPBi-CG, see Exercise 8.13.

Properties hybrid Bi-CG Pros Converges often twice as fast as Bi-CG w.r.t. # MVs: each MV expands the search subspace Bi-CG: x k x 0 K k (A; r 0 ) à 2k MV. Hybrid Bi-CG: x k x 0 K 2k (A; r 0 ) à 2k MV. Work/MV and storage similar to Bi-CG. Transpose free. Explicit computation of Bi-CG scalars. Cons Non-optimal Krylov subspace method. Peaks in the convergence history. Large intermediate residuals. Breakdown possibilities.

Conjugate Gradients Squared [Sonneveld 89] r k = p BiCG k (A) p BiCG k (A) r 0 CGS exploits recurrence relations for the Bi-CG polynomials to design a very efficient algorithm. Properties + Hybrid Bi-CG. + A very efficient algorithm: 1 DOT/MV, 3.25 AXPY/MV; storage: 7 large vectors. Often high peaks in its convergence history Often large intermediate residuals + Seems to do well as linear solver in a Newton scheme

[Sonneveld 89] Conjugate Gradients Squared x = 0, r = b. u = w = 0, ρ = 1. Choose r. While r > tol do σ = ρ, ρ = (r, r), β = ρ/σ w u β w v = r β u w v β w, c = Aw σ = (c, r), α = ρ/σ u = v α c r r α A(v + u) x x + α (v + u) end while

Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

Properties Bi-CGSTAB Pros Hybrid Bi-CG. Converges faster (& smoother) than CGS. More accurate than CGS. 2 DOT/MV, 3 AXPY/MV. Storage: 6 large vectors. Cons Danger of (A) Lanczos breakdown (ρ k = 0), (B) pivot breakdown (σ k = 0), (C) breakdown minimization (ω k = 0).

1.0e+00 1.0e-01 Bi-CG BiCGSTAB GCR(100) 100-truncated GCR 1.0e-02 Log10 of residual norm 1.0e-03 1.0e-04 1.0e-05 1.0e-06 1.0e-07 1.0e-08 1.0e-09 0 50 100 150 200 250 300 Mv (a u x ) x (a u y ) y = 1 on [0, 1] [0, 1]. a = 1000 for 0.1 x, y 0.9 and a = 1 elsewhere. Dirichlet BC on y = 0, Neumann BC on other parts of Boundary. 82 82 volumes. ILU Decomp.

1 u = 0 u = 1 F = 100 u = 1 A = 10e4 A = 10e-5 A = 10e2 0 u = 1 1 (a u x ) x (a u y ) y + b u x = f on [0, 1] [0, 1]. definition of a = A and of f = F ; F = 0 except...

4 -.- Bi-CG, -- Bi-CGSTAB, - BiCGstab(2) 2 log10 of residual norm 0-2 -4-6 -8 0 50 100 150 200 250 300 350 400 450 500 number of matrix multiplications (a u x ) x (a u y ) y + bu x = f on [0, 1] [0, 1]. b(x, y) = 2 exp(2(x 2 + y 2 )), a changes strongly Dirichlet BC. 129 129 volumes. ILU Decomp.

4 -.- Bi-CG, -- Bi-CGSTAB, - BiCGstab(2) 2 log10 of residual norm 0-2 -4-6 -8 0 100 200 300 400 500 600 700 800 900 1000 number of matrix multiplications (a u x ) x (a u y ) y + bu x = f on [0, 1] [0, 1]. b(x, y) = 2 exp(2(x 2 + y 2 )), a changes strongly Dirichlet BC. 201 201 volumes. ILU Decomp.

Breakdown of the minimization Exact arithmetic, ω k = 0: No reduction of residual by Q k+1 r k+1 = (I ω k A ) Q k r BiCG k+1. ( ) q k+1 is of degree k: Bi-CG scalars can not be computed; breakdown of incorporated Bi-CG. Finite precision arithmetic, ω k 0: Poor reduction of residual by ( ) Bi-CG scalars are seriously affected by evaluation errors: drop of speed of convergence. ω k 0 to be expected if A is real and A has eigenvalues with rel. large imaginary part: ω k is real!

Example. q k+1 (ζ) = (1 ω k ζ)q k (ζ) (ζ C). How to choose ω k? Bi-CGSTABilized. With s k Ar k, ω k argmin ω r k ω Ar k 2 = s k r k s k s k as in Local Minimal Residual method, or, equivalently, as in GCR(1). BiCGstab(l). Cycle l times through the Bi-CG part to compute A j u, A j r for j = 0,..., l, where now u Q k u BiCG k+l 1 and r Q k r BiCG k+l for k = ml. γ m argmin γ r [Ar,..., A l r ] γ 2

r k+l = r [Ar,..., A l r ] γ m q k+l (ζ) = (1 [ζ,..., ζ l ] γ m )q k (ζ) (ζ C)

BiCGstab(l) for l 2 [Sl Fokkema 93, Sl vdv Fokkema 94] { qk+1 (A) = A q k (A) k ml q ml+l (A) = ϕ m (A) q ml (A) where ϕ m of exact degree l, ϕ m (0) = 1 and k = ml ϕ m minimizes ϕ m (A) q ml (A) r BiCG }{{ ml+l } 2. r Minimization in practice: p m (ζ) = 1 l j=1 γ (m) j ζ j (γ (m) j ) argmin (γj ) r l j=1 γ j A j r 2, Compute Ar, A 2 r,..., A l r explicitly. With R [ Ar,..., A l r ], γ m (γ (m) 1,..., γ (m) l ) T we have [Normal Equations, use Choleski] (R R) γ m = R r

BiCGstab(l) x = 0, r = [ b ]. Choose r. u = [ 0 ], γ l = σ = 1. While r > tol σ γ l σ do For j = 1 to l do ρ = (r j, r), β = ρ/σ u r β u, u [ u, Au j ] σ = (u j+1, r), α = ρ/σ r r α u 2:j+1, r [ r, Ar j ] x x + α u 1 end for R r 2:l+1. Solve (R R) γ = R r 1 for γ u [ u 1 (γ 1 u 2 +... + γ l u l+1 ) ] r [ r 1 (γ 1 r 2 +... + γ l r l+1 ) ] x x + (γ 1 r 1 +... + γ l r l ) end while

epsilon = 10^(-16); ell = 4; x = zeros(b); rt = rand(b); sigma = 1; omega = 1; u = zeros(b); y = MV(x); r = b-y; norm = r *r; nepsilon = norm*epsilon^2; L = 2:ell+1; while norm > nepsilon sigma = -omega*sigma; y = r; for j = 1:ell rho = rt *y; beta = rho/sigma; u = r-beta*u; y = MV(u(:,j)); u(:,j+1) = y; sigma = rt *y; alpha = rho/sigma; r = r-alpha*u(:,2:j+1); x = x+alpha*u(:,1); y = MV(r(:,j)); r(:,j+1) = y; end G = r *r; gamma = G(L,L)\G(L,1); omega = gamma(ell); u = u*[1;-gamma]; r = r*[1;-gamma]; x = x+r*[gamma;0]; norm = r *r; end

2 -.- Bi-CG, -- Bi-CGSTAB, - BiCGstab(2) 0 log10 of residual norm -2-4 -6-8 -10 0 100 200 300 400 500 600 number of matrix multiplications u xx + u yy + u zz + 1000u x = f. f s.t. u(x, y, z) = exp(xyz) sin(πx) sin(πy) sin(πz). (52 52 52) volumes. No preconditioning.

-. BiCGStab2, : Bi-CGSTAB, -- BiCGstab(2), - BiCGstab(4) 4 2 log10 of residual norm 0-2 -4-6 -8 0 100 200 300 400 500 600 number of matrix multiplications (a u x ) x (a u y ) y = 1 on [0, 1] [0, 1]. a = 1000 for 0.1 x, y 0.9 and a = 1 elsewhere. Dirichlet BC on y = 0, Neumann BC on other parts of Boundary. 200 200 volumes. ILU Decomp.

-. BiCGStab2, : Bi-CGSTAB, -- BiCGstab(2), - BiCGstab(4) 0-1 log10 of residual norm -2-3 -4-5 -6-7 -8 0 200 400 600 800 1000 number of matrix multiplications ϵ(u xx + u yy ) + a(x, y)u x + b(x, y)u y = 0 on [0, 1] [0, 1], Dirichlet BC ϵ = 10 1, a(x, y) = 4x(x 1)(1 2y), b(x, y) = 4y(1 y)(1 2x), u(x, y) = sin(πx) + sin(13πx) + sin(πy) + sin(13πy) (201 201) volumes, no preconditioning.

2 0-2 -4-6 -8-10 -12-14 -16 0 50 100 150 200 250 300 u xx + u yy + u zz + 1000 u x = f. f is defined by the solution u(x, y, z) = exp(xyz) sin(πx) sin(πy) sin(πz). (10 10 10) volumes. No preconditioning.

ρ k = (r k, r 0 ), ρ k = ρ k(1 + ϵ) Accurate Bi-CG coefficients ϵ n ξ r k 2 r 0 2 (r k, r 0 ) = n ξ ρ k where ρ k (r k, r 0 ) r k 2 r 0 2 2 0-2 -4-6 -8-10 -12-14 -16-18 0 50 100 150 200 250 300

Why using pol. factors of degree 2? Hybrid Bi-CG, that is faster than Bi-CGSTAB 1 sweep BiCGstab(l) versus l steps Bi-CGSTAB: Reduction with MR-polynomial of degree l is better than l MR-pol. of degr. 1. MR-polynomial of degree l contributes only once to an increase of ρ k Why not? Efficiency: 1.75 + 0.25 l DOT/MV, 2.5 + 0.5 l AXPY/MV Storage: 2l + 5 large vector. Loss of accuracy: rk b Ax k (... + c ξ max γi A A i 1 r ) break-downs are possible

Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

Hybrid Bi-CG Notation. If p k is a polynomial of exact degree k, r 0 n-vector, let S(p k, A, r 0 ) {p k (A)v v K k (A, r 0 )} Theorem. Hybrid Bi-CG find residuals r k S(p k, A, r 0 ). Example. Bi-CGSTAB: where, in every step, p k (λ) = (1 ω k λ) p k 1 (λ) ω k = minarg ω r ωar 2, where r = p k 1 (A)v, v = r Bi-CG k

Hybrid Bi-CG Notation. If p k is a polynomial of exact degree k, r 0 n-vector, let S(p k, A, r 0 ) {p k (A)v v K k (A, r 0 )} Theorem. Hybrid Bi-CG find residuals r k S(p k, A, r 0 ). Example. BiCGstab(l): p k (λ) = (1 ω k λ) p k 1 (λ) where, every lth step γ = minarg γ r [Ar,..., A l r] γ 2, where r = p k l (A)r Bi-CG k. (1 γ 1 λ... γ l λ l ) = (1 ω k λ)... (1 ω k l λ)

Induced Dimension Reduction Definition. If p k is a polynomial of exact degree k, R R 0 = [ r 1,..., r s ] an n s matrix, then S(p k, A, R) { p k (A)v v K k (A, R) }, is the p k -Sonneveld subspace. Here K k (A, R) k 1 j=0 (A ) j R γ j γ j C s. Theorem. IDR find residuals r k S(p k, A, R). Example. Bi-CGSTAB: where, in every step, p k (λ) = (1 ω k λ) p k 1 (λ) ω k = minarg ω r ωar 2, where r = p k 1 (A)v, v = r Bi-CG k

[van Gijzen Sonneveld 07] IDR Select an x 0. Select n s matrices U and R. Compute C AU. x = x 0, r b Ax, j = s, i = 1 while r > tol do Solve R C γ = R r for γ v = r C γ, s = Av j++, if j > s, ω = s v/s s, j = 0 Ue i U γ + ωv, x = x + Ue i r 0 = r, r = v ωs, Ce i = r 0 r i++, if i > s, i = 1 end while

Select n l matricex U and R Experiments suggest R = qr(rand(n, l), 0) U and C can be constructed from l steps of GCR. We will discuss IDR in more detail in Lecture 11. See also Exercise 11.1 Exercise 11.5.