Fast iterative solvers

Size: px

Start display at page:

Download "Fast iterative solvers"

Dwight Mathews
5 years ago
Views:

1 Utrecht, 15 november 2017 Fast iterative solvers Gerard Sleijpen Department of Mathematics sleij101/

2 Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a general square matrix A. Arnoldi decomposition: AV k = V k+1 H k with V k orthonormal, H k Hessenberg. H k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. FOM: r k V k H k y k = e 1 x k = V k (Uk 1 k e 1)) GMRES: y k = argmin y b AV k y 2 = argmin y e 1 H k y 2 x k = V k (R 1 k (Q k e 1 )) The columns of V k span K k (A, r 0 ) {p(a)r 0 p P k 1 }. K k (A, r 0 ) = span(r 0, r 1,..., r k 1 ).

3 Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) Note. Short recurrences (eff. comp.) because of T k tri-diagonal, and U k U k = V k (CG) and W k R k = V k (MINRES)

4 Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) Details. For MINRES, see Exercise 7.8

5 Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 Note. Less stable because x k = (V k R 1 k )(Q k e 1 ) we rely on math. for orth. of V k (CG & MINRES) W k R k = V k (MINRES) is a three term vector recurrence for the w k s.

6 Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) Note. If A is positive definite, then CG minimizes as well: y k = argmin y x V k y A = argmin y b AV k y A 1

7 Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) SYMMLQ. Take x k = AV k y k with y k argmin y x AV k y 2 e 1 T k T k y k = 0 x k = V k+1 T k (T k T k ) 1 e 1 = (V k+1 Q k )((R k ) 1 e 1 )

8 Review. To simplify, take x 0 = 0, assume b 2 = 1. Solving Ax = b for a Hermitian matrix A, b 2 = 1 Lanczos decomposition: AV k = V k+1 T k with V k orthonormal, T k tri-diagonal. T k = L k U k = Q k R k. v 1 = r 0 = b, x k = V k y k, r k = b Ax k. CG: r k V k T k y k = e 1 x k = (V k Uk 1 k e 1) MINRES: y k = argmin y b AV k y 2 = argmin y e 1 T k y 2 x k = (V k R 1 k )(Q k e 1 ) Details. For SYMMLQ, see Exercise 7.9.

9 FOM residual polynomials and Ritz values Property. r CG k = r FOM k = p k (A)r 0 V k, for some residual polynomial p k of degree k, i.e., p k (0) = 1. p FOM k = p k is the kth (CG or) FOM residual polynomial. Theorem. p FOM (ϑ) = 0 H k y = ϑ y for some y 0: the zeros of p FOM k In particular, are precisely the kth order Ritz values. ) (1 λϑj p FOM k (λ) = k j=1 (λ C). Moreover, for a polynomial p of degree at most k with p(0) = 1, we have that that Proof. Exercise 6.6. p = p FOM k p(h k ) = 0.

10 FOM residual polynomials and Ritz values Property. r CG k = r FOM k = p k (A)r 0 V k, for some residual polynomial p k of degree k, i.e., p k (0) = 1. p FOM k = p k is the kth (CG or) FOM residual polynomial. Theorem. p FOM (ϑ) = 0 H k y = ϑ y for some y 0: the zeros of p FOM k In particular, are precisely the kth order Ritz values. ) (1 λϑj p FOM k (λ) = k j=1 (λ C). Moreover, for a polynomial p of degree at most k with p(0) = 1, we have that that p = p FOM k p(h k ) = 0. Theorem. Similarly relate zeros of p GMRES k Ritz values of H k. to harmonic

11 Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues

12 Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues

13 Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues

14 Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues

15 Solving Ax = b, an overview A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB no MINRES IDR no yes large im eig yes BiCGstab(l) yes IDRstab a good preconditioner is available the preconditioner is flexible A + A is strongly indefinite A has large imaginary eigenvalues

16 with A n n non-singular. Ax = b Today s topic. Iterative methods for general systems using short recurrences

17 Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

18 Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

19 A = A > 0, Conjugate Gradient [Hestenes Stiefel 52] x = 0, r = b, u = 0, ρ = 1 While r > tol do σ = ρ, ρ = r r, β = ρ/σ u r β u, c = Au σ = u c, α = ρ/σ r r α c x x + α u end while

20 Construction CG. There are four alternative derivations of CG. GCR (use A = A) CR use A 1 inner product + efficient implementation. Lanczos + T = LU + efficient implementation. Orthogonalize residuals. Nonlinear CG to solve x = argmin x 1 2 b A x 2 A 1... [Exercise 7.3]

21 Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k

22 Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, Ku k K k+1 (AK 1, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (AK 1, r 0 ) If r k, Au k K 1 r k 1, then r k, Au k K 1 K k (AK 1, r 0 )

23 Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, Ku k K k+1 (AK 1, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (AK 1, r 0 ) If r k, Au k K 1 r k 1, then r k, Au k K 1 K k (AK 1, r 0 ) Proof. r k = r k 1 α k 1 A u k 1 K 1 r k 1 r k = r k 1 α k 1 A u k 1 K 1 K k 1 (AK 1, r 0 ) by construction α k 1 by induction

24 Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, Ku k K k+1 (AK 1, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (AK 1, r 0 ) If r k, Au k K 1 r k 1, then r k, Au k K 1 K k (AK 1, r 0 ) Proof. A u k = AK 1 r k β k A u k 1 K 1 r k 1 A u k = AK 1 r k β k A u k 1 K 1 K k 1 (AK 1, r 0 ) by construction β k by induction: AK 1 r k K 1 K k 1 (AK 1, r 0 ) r k K 1 AK 1 K k 1 (AK 1, r 0 ) r k K 1 K k (AK 1, r 0 )

25 Conjugate Gradients, A = A, K = K u k = K 1 r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, Ku k K k+1 (AK 1, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (AK 1, r 0 ) If r k, Au k K 1 r k 1, then r k, Au k K 1 K k (AK 1, r 0 ) Proof. A u k = AK 1 r k β k A u k 1 K 1 r k 1 A u k = AK 1 r k β k A u k 1 K 1 K k 1 (AK 1, r 0 ) by construction β k by induction: AK 1 r k K 1 K k 1 (AK 1, r 0 ) r k K 1 AK 1 K k 1 (AK 1, r 0 ) r k K 1 K k (AK 1, r 0 )

26 A = A & K = K: Preconditioned CG x = 0, r = b, u = 0, ρ = 1 While r > tol do Solve Kc = r for c σ = ρ, ρ = c r, β = ρ/σ u c β u, c = Au σ u c, α = ρ/σ r r α c x x + α u end while

27 Properties CG Pros Low costs per step: 1 MV, 2 DOT, 3 AXPY to increase dimension Krylov subspace by one. Low storage: 5 large vectors (incl. b). Minimal res. method if A, K pos. def.: r k A 1 is min. Orthogonal residual method if A = A, K = K: r k K 1 K k (AK 1 ; r 0 ). No additional knowledge on properties of A is needed. Robust: CG always converges if A, K pos. def.. Cons May break down if A = A 0. Does not work if A A. CG is sensitive to evaluation errors if A = A 0. Often loss of a) super-linear conv., and b) accuracy. For two reasons: 1) Loss of orthogonality in the Lanczos recursion 2) As in FOM, bumps and peaks in CG conv. hist.

28 Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

29 For general square non-singular A Apply CG to normal equations (A Ax = A b) CGNE Apply CG to AA y = b (then x = A y) Graig s method Disadvantage. Search in K k (A A,...): If A = A then convergence is determined by A 2 : condition number squared,.... Expansion K k requires 2 MVs (i.e., many costly steps). For a discussion on Graig s method, see Exercise 8.1. For a Graig versus GCR, see Exercise 8.6.

30 For general square non-singular A Apply CG to normal equations (A Ax = A b) CGNE Apply CG to AA y = b (then x = A y) Graig s method Disadvantage. Search in K k (A A,...): If A = A then convergence is determined by A 2 : condition number squared,.... Expansion K k requires 2 MVs (i.e., many costly steps). [Faber Manteufel 90] Theorem. For general square non-singular A, there is no Krylov solver that finds the best solution in de Krylov subspace K k (A, r 0 ) using short recurrences.

31 For general square non-singular A Apply CG to normal equations (A Ax = A b) CGNE Apply CG to AA y = b (then x = A y) Graig s method Disadvantage. Search in K k (A A,...): If A = A then convergence is determined by A 2 : condition number squared,.... Expansion K k requires 2 MVs (i.e., many costly steps). [Faber Manteufel 90] Theorem. For general square non-singular A, there is no Krylov solver that finds the best solution in de Krylov subspace K k (A, r 0 ) using short recurrences. Alternative. Construct residuals in a sequence of shrinking spaces (orthogonal to a sequence of growing spaces): adapt the construction of CG.

32 Conjugate Gradients, A = A. K = I u k = r k β k u k 1 r k+1 = r k α k A u k Theorem. r k, u k K k+1 (A, r 0 ) r 0,..., r k 1 is a Krylov basis of K k (A, r 0 ) If r k, Au k r k 1, then r k, Au k K k (A, r 0 ) Proof. A u k = A r k β k A u k 1 r k 1 A u k = A r k β k A u k 1 K k 1 (A, r 0 ) by construction β k 1 by induction: Ar k K k 1 (A, r 0 ) r k AK k 1 (A, r 0 ) r k K k (A, r 0 )

33 Bi-Conjugate Gradients u k = r k β k u k 1 r k+1 = r k α k A u k Theorem. We have r k, u k K k+1 (A, r 0 ). Suppose r 0,..., r k 1 is a Krylov basis of K k (A, r 0 ). If r k, Au k r k 1, then r k, Au k K k (A, r 0 ). Proof. r k = r k 1 α k 1 A u k 1 r k 1 r k = r k 1 α k 1 A u k 1 K k 1 (A, r 0 ) by construction α k 1 by induction A u k = A r k β k A u k 1 r k 1 A u k = A r k β k A u k 1 K k 1 (A, r 0 ) by construction β k 1 by induction: Ar k K k 1 (A, r 0 ) r k K k (A, r 0 ) A K k 1 (A, r 0 )

34 Bi-Conjugate Gradients u k = r k β k u k 1 r k+1 = r k α k A u k r k, u k K k+1 (A, r 0 ), r k, Au k r k 1 With ρ k (r k, r k ) & σ k (Au k, r k ) and, since r k + ϑ k A r k 1 K k (A, r 0 ) for some ϑ k, is the complex conjugate we have that α k = (r k, r k ) (Au k, r k ) = ρ k σ k and β k = (Ar k, r k 1 ) (Au k 1, r k 1 ) = (r k, A r k 1 ) σ k 1 = ρ k ϑ k σ k 1

35 Bi-Conjugate Gradients u k = r k β k u k 1 r k+1 = r k α k A u k r k, u k K k+1 (A, r 0 ), r k, Au k r k 1 With ρ k (r k, q k (A ) r 0 ) & σ k (Au k, q k (A ) r 0 ) and, since q k (ζ) + ϑ k ζ q k 1 (ζ) P k 1 for some ϑ k, we have that α k = ρ k σ k & β k = ρ k ϑ k σ k 1

36 Bi-Conjugate Gradients Classical Bi-CG [Fletcher 76] generates the shadow residuals r k = q k (A ) r 0 with the same polynomal as r k (q k = p k ) r k = p k (A)r 0, r k = p k (A ) r 0 : i.e., compute r k+1 as r k+1 = r k ᾱ k A ũ k, with ũ k = r k β k ũ k 1. In particular, ϑ k = α k 1. However, other choices for q k are possible as well. Example. q k (ζ) = (1 ω k 1 ζ) q k 1 (ζ) (ζ C). Then, ϑ k = ω k 1 and r k = r k 1 ω k 1 A r k 1, with, for instance, ω k 1 to minimize r k 2. The next transparancy displays classical Bi-CG.

37 [Fletcher 76] Bi-CG x = 0, r = b. Choose r u = 0, ρ = 1 ũ = 0 While r > tol do σ = ρ, ρ = (r, r), β = ρ/σ u r β u, c = Au, ũ r β ũ, c = A ũ σ = (c, r), α = ρ/σ r r α c, r r ᾱ c x x + α u end while

38 [Fletcher 76] Bi-CG x = 0, r = b. Choose r u = 0, ρ = 1 c = 0 While r > tol do σ = ρ, ρ = (r, r), β = ρ/σ u r β u, c = Au, σ = (c, r), α = ρ/σ r r α c, x x + α u end while c A r β c r r ᾱ c

39 Selecting the initial shadow residual r 0. Often recommended: r 0 = r 0. Practical experience: select r 0 randomly (unless A = A). Exercise. Bi-CG and CG coincide if A is Hermitian and r 0 = r 0. Exercise. Derive a version of Bi-CG that includes a preconditioner K. Show that Bi-CG and CG coincide if A and K are Hermitian and r 0 = K 1 r 0. Exercise 8.9 gives an alternative derivation of Bi-CG.

40 Properties Bi-CG Pros Usually selects good approximations from the search subspaces (Krylov subspaces). Low costs per step: 2 DOT, 5 AXPY. Low storage: 7 large vectors. No knowledge on properties of A is needed. Cons Non-optimal Krylov subspace method. Not robust: Bi-CG may break down. Bi-CG is sensitive to evaluation errors (often loss of super-linear convergence). Convergence depends on shadow residual r 0. 2 MV needed to expand search subspace by 1 vector. 1 MV is by A.

41 Properties Bi-CG Pros Usually selects good approximations from the search subspaces (Krylov subspaces). Low costs per step: 2 DOT, 5 AXPY. Low storage: 8 large vectors. No knowledge on properties of A is needed. Cons Non-optimal Krylov subspace method. Not robust: Bi-CG may break down. Bi-CG is sensitive to evaluation errors (often loss of super-linear convergence). Convergence depends on shadow residual r 0. 2 MV needed to expand search subspace by 1 vector. 1 MV is by A.

42 Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

43 Bi-Lanczos Find coefficients α k, β k, α k and β k such that (bi-orthogonalize) γ k v k+1 = Av k α k v k β k v k 1... w k, w k 1,... γ k w k+1 = A w k α k w k β k w k 1... v k, v k 1,... Select appropriate scaling coefficients γ k and γ k. Then AV k = V k+1 H k with H k Hessenberg A W k = W k+1 H k with H k Hessenberg and W k+1 V k+1 = D k+1 diagonal Exercise. T k W k AV k = D k H k = H k Dk is tridiagonal. Exploit H k = D k Hk D k Bi-Lanczos. and tridiagonal structure:

44 Bi-Lanczos Find coefficients α k, β k, α k and β k such that (bi-orthogonalize) γ k v k+1 = Av k α k v k β k v k 1... w k, w k 1,... γ k w k+1 = A w k α k w k β k w k 1... v k, v k 1,... Select appropriate scaling coefficients γ k and γ k. Then AV k = V k+1 H k with H k Hessenberg A W k = W k+1 H k with H k Hessenberg and W k+1 V k+1 = D k+1 diagonal Exercise. T k W k AV k = D k H k = H k Dk is tridiagonal. Exploit H k = D k Hk D k Bi-Lanczos. and tridiagonal structure: See Exercise 8.7 for details.

45 [Lanczos 50] Lanczos ρ = r 0, v 1 = r 0 /ρ β 0 = 0, v 0 = 0 for k = 1, 2,... do ṽ = A v k β k 1 v k 1 α k = v k ṽ, β k = ṽ, ṽ ṽ α k v k v k+1 = ṽ/β k end while

46 [Lanczos 50] Bi-Lanczos Select a r 0, and a r 0 v 1 =r 0 / r 0, v 0 =0, w 1 = r 0 / r 0, w 0 =0 γ 0 = 0, δ 0 = 1, γ 0 = 0, δ0 = 1 For k = 1, 2,... do δ k = w k v k, ṽ = A v k, β k = γ k 1 δ k /δ k 1, ṽ ṽ β k v k 1, α k = w kṽ/δ k, ṽ ṽ α k v k, w = A w k βk = γ k 1 δ k /δ k 1 w w βk w k 1 α k = α k w w α k w k Select a γ k 0 and a γ k 0 v k+1 = ṽ/γ k, w k+1 = w/ γ k, V k = [ V k 1, v k ], W k = [ W k 1, w k ] end while

47 Arnoldi: AV k = V k+1 H k. If A = A, then T k H k tridiagonal Lanczos Lanczos + T = LU + efficient implementation CG Bi-Lanczos + T = LU + efficient implementation Bi-CG

48 [Fletcher 76] Bi-CG x = 0, r = b. Choose r u = 0, ρ = 1 c = 0 While r > tol do σ = ρ, ρ = (r, r), β = ρ/σ u r β u, c = Au, σ = (c, r), α = ρ/σ r r α c, x x + α u end while c A r β c r r ᾱ c

49 Bi-CG may break down 0) Lucky breakdown if r k = 0. 1) Pivot breakdown or LU-breakdown, i.e., LU-decomposition may not exist. Corresponds to σ = 0 in Bi-CG Remedy. Composite step Bi-CG (skip once forming T k = L k U k ) Form T = QR as in MINRES (from the beginning): simple Quasi Minimal Residuals 2) Bi-Lanczos may break down, i.e., a diagonal element of D k may be zero. Corresponds to ρ = 0 in Bi-CG Remedy. Look ahead General remedy. Restart Look ahead in QMR

50 Note. CG may suffer from pivot breakdown when applied to a Hermitian, non definite matrix (A = A with positive as well as negative eigenvalues): MINRES and SYMMLQ cure this breakdown. Note. Exact breakdowns are rare. However, near breakdowns lead to irregular convergence and instabilities. This leads to loss of speed of convergence loss of accuracy

51 Properties Bi-CG Advantages Usually selects good approximations from the search subspaces (Krylov subspaces). 2 DOT, 5 AXPY per step. Storage: 8 large vectors. No knowledge on properties of A is needed. Drawbacks Non-optimal Krylov subspace method. Not robust: Bi-CG may break down. Bi-CG is sensitive to evaluation errors (often loss of super-linear convergence). Convergence depends on shadow residual r 0. 2 MV needed to expand search subspace. 1 MV is by A.

52 Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

53 Bi-Conjugate Gradients, K=I u k = r k β k u k 1 r k+1 = r k α k A u k r k, u k K k+1 (A, r 0 ), r k, Au k r k 1 With ρ k (r k, q k (A ) r 0 ) & σ k (Au k, q k (A ) r 0 ) and, since q k (ζ) + ϑ k ζ q k 1 (ζ) P k 1 for some ϑ k, we have that α k = ρ k σ k & β k = ρ k ϑ k σ k 1

54 Transpose-free Bi-CG [Sonneveld 89] ρ k = (r k, q k (A ) r 0 ) = (q k (A)r k, r 0 ), σ k = (Au k, q k (A ) r 0 ) = (Aq k (A)u k, r 0 ) Q k q k (A) (Bi-CG) { ρk, Q k u k = Q k r k β k Q k u k 1, σ k, Q k r k+1 = Q k r k α k AQ k u k, (Pol) Compute q k+1 of degree k + 1 s.t. q k+1 (0) = 1. Compute Q k+1 u k, Q k+1 r k+1 (from Q k u k, Q k r k+1,...

55 Transpose-free Bi-CG [Sonneveld 89] ρ k = (r k, q k (A ) r 0 ) = (q k (A)r k, r 0 ), σ k = (Au k, q k (A ) r 0 ) = (Aq k (A)u k, r 0 ) Q k q k (A) (Bi-CG) { ρk, Q k u k = Q k r k β k Q k u k 1, σ k, Q k r k+1 = Q k r k α k AQ k u k, (Pol) Compute q k+1 of degree k + 1 s.t. q k+1 (0) = 1. Compute Q k+1 u k, Q k+1 r k+1 Example. q k+1 (ζ) = (1 ω k ζ)q k (ζ)

56 Transpose-free Bi-CG [Sonneveld 89] ρ k = (r k, q k (A ) r 0 ) = (q k (A)r k, r 0 ), σ k = (Au k, q k (A ) r 0 ) = (Aq k (A)u k, r 0 ) Q k q k (A) (Bi-CG) { ρk, Q k u k = Q k r k β k Q k u k 1, σ k, Q k r k+1 = Q k r k α k AQ k u k, (Pol) Compute q k+1 of degree k + 1 s.t. q k+1 (0) = 1. Compute Q k+1 u k, Q k+1 r k+1 Example. q k+1 (ζ) = (1 ω k ζ)q k (ζ) (ζ C). { ωk, Q k+1 u k = Q k u k ω k AQ k u k, Q k+1 r k+1 = Q k r k+1 ω k AQ k r k+1,

57 Transpose-free Bi-CG; Practice Work with u k Q ku BiCG k and r k Q kr BiCG k+1 Write u k 1 and r k, instead of Q k u BiCG k 1 ρ k = (r k, r 0 ), σ k = (Au k, r 0 ) and Q kr BiCG k, resp. (Bi-CG) { ρk = (r k, r 0 ), u k = r k β k u k 1, σ k = (Au k, r 0 ), r k = r k α k Au k, x k = x k + α k u k (Pol) Compute updating coefficients for q k+1. Compute u k, r k+1, x k+1 Example. { ωk, u k+1 = u k ω k Au k, r k+1 = r k ω k Ar k, x k+1 = x k + ω kr k

58 Example. q k+1 (ζ) = (1 ω k ζ)q k (ζ) (ζ C). How to choose ω k? Bi-CGSTABilized. With s k Ar k, ω k argmin ω r k ω Ar k 2 = s k r k s k s k as in Local Minimal Residual method, or, equivalently, as in GCR(1).

59 BiCGSTAB [van der Vorst 92] x = 0, r = b. u = 0, ω = σ = 1. Choose r While r > tol do σ ωσ, ρ = (r, r), β = ρ/σ u r β u, c = Au σ = (c, r), α = ρ/σ r r α c, x x + α u s = Ar, ω = (r, s)/(s, s) u u ω c x x + ω r r r ω s end while

60 Hybrid Bi-CG or product type Bi-CG r k q k (A)r Bi-CG k = q k (A) p BiCG k (A) r 0 p BiCG k is the kth Bi-CG residual polynomial How to select q k?? q k for efficient steps & fast convergence. Fast convergence by reducing the residual stabilizing the Bi-CG part other when used as linear solver for the Jacobian system in a Newton scheme for non-linear equations, by reducing the number of Newton steps

61 Hybrid Bi-CG Examples. CGS Bi-CG Bi-CG Sonneveld [1989] Bi-CGSTAB GCR(1) Bi-CG van der Vorst [1992] GPBi-CG 2-truncated GCR Bi-CG Zhang [1997] BiCGstab(l) GCR(l) Bi-CG Sl. Fokkema [1993] For more details on hybrid Bi-CG, see Exercise 8.11 and Exercise For a derivation of GPBi-CG, see Exercise 8.13.

62 Properties hybrid Bi-CG Pros Converges often twice as fast as Bi-CG w.r.t. # MVs: each MV expands the search subspace Bi-CG: x k x 0 K k (A; r 0 ) à 2k MV. Hybrid Bi-CG: x k x 0 K 2k (A; r 0 ) à 2k MV. Work/MV and storage similar to Bi-CG. Transpose free. Explicit computation of Bi-CG scalars. Cons Non-optimal Krylov subspace method. Peaks in the convergence history. Large intermediate residuals. Breakdown possibilities.

63 Conjugate Gradients Squared [Sonneveld 89] r k = p BiCG k (A) p BiCG k (A) r 0 CGS exploits recurrence relations for the Bi-CG polynomials to design a very efficient algorithm. Properties + Hybrid Bi-CG. + A very efficient algorithm: 1 DOT/MV, 3.25 AXPY/MV; storage: 7 large vectors. Often high peaks in its convergence history Often large intermediate residuals + Seems to do well as linear solver in a Newton scheme

64 [Sonneveld 89] Conjugate Gradients Squared x = 0, r = b. u = w = 0, ρ = 1. Choose r. While r > tol do σ = ρ, ρ = (r, r), β = ρ/σ w u β w v = r β u w v β w, c = Aw σ = (c, r), α = ρ/σ u = v α c r r α A(v + u) x x + α (v + u) end while

65 Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

66 Properties Bi-CGSTAB Pros Hybrid Bi-CG. Converges faster (& smoother) than CGS. More accurate than CGS. 2 DOT/MV, 3 AXPY/MV. Storage: 6 large vectors. Cons Danger of (A) Lanczos breakdown (ρ k = 0), (B) pivot breakdown (σ k = 0), (C) breakdown minimization (ω k = 0).

67 1.0e e-01 Bi-CG BiCGSTAB GCR(100) 100-truncated GCR 1.0e-02 Log10 of residual norm 1.0e e e e e e e Mv (a u x ) x (a u y ) y = 1 on [0, 1] [0, 1]. a = 1000 for 0.1 x, y 0.9 and a = 1 elsewhere. Dirichlet BC on y = 0, Neumann BC on other parts of Boundary volumes. ILU Decomp.

68 1 u = 0 u = 1 F = 100 u = 1 A = 10e4 A = 10e-5 A = 10e2 0 u = 1 1 (a u x ) x (a u y ) y + b u x = f on [0, 1] [0, 1]. definition of a = A and of f = F ; F = 0 except...

69 4 -.- Bi-CG, -- Bi-CGSTAB, - BiCGstab(2) 2 log10 of residual norm number of matrix multiplications (a u x ) x (a u y ) y + bu x = f on [0, 1] [0, 1]. b(x, y) = 2 exp(2(x 2 + y 2 )), a changes strongly Dirichlet BC volumes. ILU Decomp.

70 4 -.- Bi-CG, -- Bi-CGSTAB, - BiCGstab(2) 2 log10 of residual norm number of matrix multiplications (a u x ) x (a u y ) y + bu x = f on [0, 1] [0, 1]. b(x, y) = 2 exp(2(x 2 + y 2 )), a changes strongly Dirichlet BC volumes. ILU Decomp.

71 Breakdown of the minimization Exact arithmetic, ω k = 0: No reduction of residual by Q k+1 r k+1 = (I ω k A ) Q k r BiCG k+1. ( ) q k+1 is of degree k: Bi-CG scalars can not be computed; breakdown of incorporated Bi-CG. Finite precision arithmetic, ω k 0: Poor reduction of residual by ( ) Bi-CG scalars are seriously affected by evaluation errors: drop of speed of convergence. ω k 0 to be expected if A is real and A has eigenvalues with rel. large imaginary part: ω k is real!

72 Example. q k+1 (ζ) = (1 ω k ζ)q k (ζ) (ζ C). How to choose ω k? Bi-CGSTABilized. With s k Ar k, ω k argmin ω r k ω Ar k 2 = s k r k s k s k as in Local Minimal Residual method, or, equivalently, as in GCR(1). BiCGstab(l). Cycle l times through the Bi-CG part to compute A j u, A j r for j = 0,..., l, where now u Q k u BiCG k+l 1 and r Q k r BiCG k+l for k = ml. γ m argmin γ r [Ar,..., A l r ] γ 2

73 r k+l = r [Ar,..., A l r ] γ m q k+l (ζ) = (1 [ζ,..., ζ l ] γ m )q k (ζ) (ζ C)

74 BiCGstab(l) for l 2 [Sl Fokkema 93, Sl vdv Fokkema 94] { qk+1 (A) = A q k (A) k ml q ml+l (A) = ϕ m (A) q ml (A) where ϕ m of exact degree l, ϕ m (0) = 1 and k = ml ϕ m minimizes ϕ m (A) q ml (A) r BiCG }{{ ml+l } 2. r ϕ m is a GCR residual polynomial of degree l. Note that real polynomials of degree 2 can have complex zeros.

75 BiCGstab(l) for l 2 [Sl Fokkema 93, Sl vdv Fokkema 94] { qk+1 (A) = A q k (A) k ml q ml+l (A) = ϕ m (A) q ml (A) where ϕ m of exact degree l, ϕ m (0) = 1 and k = ml ϕ m minimizes ϕ m (A) q ml (A) r BiCG }{{ ml+l } 2. r Minimization in practice: p m (ζ) = 1 l j=1 γ (m) j ζ j (γ (m) j ) argmin (γj ) r l j=1 γ j A j r 2, Compute Ar, A 2 r,..., A l r explicitly. With R [ Ar,..., A l r ], γ m (γ (m) 1,..., γ (m) l ) T we have [Normal Equations, use Choleski] (R R) γ m = R r

76 BiCGstab(l) x = 0, r = [ b ]. Choose r. u = [ 0 ], γ l = σ = 1. While r > tol σ γ l σ do For j = 1 to l do ρ = (r j, r), β = ρ/σ u r β u, u [ u, Au j ] σ = (u j+1, r), α = ρ/σ r r α u 2:j+1, r [ r, Ar j ] x x + α u 1 end for R r 2:l+1. Solve (R R) γ = R r 1 for γ u [ u 1 (γ 1 u γ l u l+1 ) ] r [ r 1 (γ 1 r γ l r l+1 ) ] x x + (γ 1 r γ l r l ) end while

77 epsilon = 10^(-16); ell = 4; x = zeros(b); rt = rand(b); sigma = 1; omega = 1; u = zeros(b); y = MV(x); r = b-y; norm = r *r; nepsilon = norm*epsilon^2; L = 2:ell+1; while norm > nepsilon sigma = -omega*sigma; y = r; for j = 1:ell rho = rt *y; beta = rho/sigma; u = r-beta*u; y = MV(u(:,j)); u(:,j+1) = y; sigma = rt *y; alpha = rho/sigma; r = r-alpha*u(:,2:j+1); x = x+alpha*u(:,1); y = MV(r(:,j)); r(:,j+1) = y; end G = r *r; gamma = G(L,L)\G(L,1); omega = gamma(ell); u = u*[1;-gamma]; r = r*[1;-gamma]; x = x+r*[gamma;0]; norm = r *r; end

78 2 -.- Bi-CG, -- Bi-CGSTAB, - BiCGstab(2) 0 log10 of residual norm number of matrix multiplications u xx + u yy + u zz u x = f. f s.t. u(x, y, z) = exp(xyz) sin(πx) sin(πy) sin(πz). ( ) volumes. No preconditioning.

79 -. BiCGStab2, : Bi-CGSTAB, -- BiCGstab(2), - BiCGstab(4) 4 2 log10 of residual norm number of matrix multiplications (a u x ) x (a u y ) y = 1 on [0, 1] [0, 1]. a = 1000 for 0.1 x, y 0.9 and a = 1 elsewhere. Dirichlet BC on y = 0, Neumann BC on other parts of Boundary volumes. ILU Decomp.

80 -. BiCGStab2, : Bi-CGSTAB, -- BiCGstab(2), - BiCGstab(4) 0-1 log10 of residual norm number of matrix multiplications ϵ(u xx + u yy ) + a(x, y)u x + b(x, y)u y = 0 on [0, 1] [0, 1], Dirichlet BC ϵ = 10 1, a(x, y) = 4x(x 1)(1 2y), b(x, y) = 4y(1 y)(1 2x), u(x, y) = sin(πx) + sin(13πx) + sin(πy) + sin(13πy) ( ) volumes, no preconditioning.

81 u xx + u yy + u zz u x = f. f is defined by the solution u(x, y, z) = exp(xyz) sin(πx) sin(πy) sin(πz). ( ) volumes. No preconditioning.

82 ρ k = (r k, r 0 ), ρ k = ρ k(1 + ϵ) Accurate Bi-CG coefficients ϵ n ξ r k 2 r 0 2 (r k, r 0 ) = n ξ ρ k where ρ k (r k, r 0 ) r k 2 r

83 Why using pol. factors of degree 2? Hybrid Bi-CG, that is faster than Bi-CGSTAB 1 sweep BiCGstab(l) versus l steps Bi-CGSTAB: Reduction with MR-polynomial of degree l is better than l MR-pol. of degr. 1. MR-polynomial of degree l contributes only once to an increase of ρ k Why not? Efficiency: l DOT/MV, l AXPY/MV Storage: 2l + 5 large vector. Loss of accuracy: rk b Ax k (... + c ξ max γi A A i 1 r ) break-downs are possible

84 Properties Bi-CG Advantages Usually selects good approximations from the search subspaces (Krylov subspaces). 2 DOT, 5 AXPY per step. Storage: 8 large vectors. No knowledge on properties of A is needed. Drawbacks Non-optimal Krylov subspace method. Not robust: Bi-CG may break down. Bi-CG is sensitive to evaluation errors (often loss of super-linear convergence). Convergence depends on shadow residual r 0. 2 MV needed to expand search subspace. 1 MV is by A.

85 Program Lecture 8 CG Bi-CG Bi-Lanczos Hybrid Bi-CG Bi-CGSTAB, BiCGstab(l) IDR

86 Hybrid Bi-CG Notation. If p k is a polynomial of exact degree k, r 0 n-vector, let S(p k, A, r 0 ) {p k (A)v v K k (A, r 0 )} Theorem. Hybrid Bi-CG find residuals r k S(p k, A, r 0 ). Example. Bi-CGSTAB: where, in every step, p k (λ) = (1 ω k λ) p k 1 (λ) ω k = minarg ω r ωar 2, where r = p k 1 (A)v, v = r Bi-CG k

87 Hybrid Bi-CG Notation. If p k is a polynomial of exact degree k, r 0 n-vector, let S(p k, A, r 0 ) {p k (A)v v K k (A, r 0 )} Theorem. Hybrid Bi-CG find residuals r k S(p k, A, r 0 ). Example. BiCGstab(l): p k (λ) = (1 ω k λ) p k 1 (λ) where, every lth step γ = minarg γ r [Ar,..., A l r] γ 2, where r = p k l (A)r Bi-CG k. (1 γ 1 λ... γ l λ l ) = (1 ω k λ)... (1 ω k l λ)

88 Induced Dimension Reduction Definition. If p k is a polynomial of exact degree k, R R 0 = [ r 1,..., r s ] an n s matrix, then S(p k, A, R) { p k (A)v v K k (A, R) }, is the p k -Sonneveld subspace. Here K k (A, R) k 1 j=0 (A ) j R γ j γ j C s. Theorem. IDR find residuals r k S(p k, A, R). Example. Bi-CGSTAB: where, in every step, p k (λ) = (1 ω k λ) p k 1 (λ) ω k = minarg ω r ωar 2, where r = p k 1 (A)v, v = r Bi-CG k

89 [van Gijzen Sonneveld 07] IDR Select an x 0. Select n s matrices U and R. Compute C AU. x = x 0, r b Ax, j = s, i = 1 while r > tol do Solve R C γ = R r for γ v = r C γ, s = Av j++, if j > s, ω = s v/s s, j = 0 Ue i U γ + ωv, x = x + Ue i r 0 = r, r = v ωs, Ce i = r 0 r i++, if i > s, i = 1 end while

90 Select n l matricex U and R Experiments suggest R = qr(rand(n, l), 0) U and C can be constructed from l steps of GCR. We will discuss IDR in more detail in Lecture 11. See also Exercise 11.1 Exercise 11.5.

Fast iterative solvers

Solving Ax = b, an overview Utrecht, 15 november 2017 Fast iterative solvers A = A no Good precond. yes flex. precond. yes GCR yes A > 0 yes CG no no GMRES no ill cond. yes SYMMLQ str indef no large im