A short course on: Preconditioned Krylov subspace methods. Yousef Saad University of Minnesota Dept. of Computer Science and Engineering

A short course on: Preconditioned Krylov subspace methods Yousef Saad University of Minnesota Dept. of Computer Science and Engineering Universite du Littoral, Jan 19-3, 25 Outline Part 1 Introd., discretization of PDEs Sparse matrices and sparsity Basic iterative methods (Relaxation..) Part 2 Projection methods Krylov subspace methods Part 3 Preconditioned iterations Preconditioning techniques Parallel implementations Part 4 Eigenvalue problems Applications Calais February 7, 25 2 THE PROBLEM We consider the linear system Ax = b PROJECTION METHODS FOR LINEAR SYSTEMS where A is N N and can be Real symmetric positive definite Real nonsymmetric Complex Focus: A is large and sparse, possibly with an irregular structure Calais February 7, 25 4

PROJECTION METHODS Initial Problem: b Ax = Given two subspaces K and L of R N de ne the approximate problem: Find x K such that b A x L Leads to a small linear system ( projected problems ) This is a basic projection step. Typically: sequence of such steps are applied With a nonzero initial guess x, the approximate problem is Find x x + K such that b A x L Write x = x + δ and r = b Ax. Leads to a system for δ: Find δ K such that r Aδ L Matrix representation: V = [v 1,..., v m ] a basis of K & Let W = [w 1,..., w m ] a basis of L Then letting x be the approximate solution x = x + δ x + V y where y is a vector of R m, the Petrov-Galerkin condition yields, W T (r AV y) = and therefore x = x + V [W T AV ] 1 W T r Remark: In practice W T AV is known from algorithm and has a simple structure [tridiagonal, Hessenberg,..] Calais February 7, 25 5 Calais February 7, 25 6 PROTOTYPE PROJECTION METHOD OPERATOR FORM REPRESENTATION Until Convergence Do: 1. Select a pair of subspaces K, and L; 2. Choose bases V = [v 1,..., v m ] for K and W = [w 1,..., w m ] for L. 3. Compute r b Ax, y (W T AV ) 1 W T r, x x + V y. Let P be the orthogonal projector onto K and Q the (oblique) projector onto K and orthogonally to L. Px K, x Px K Qx K, x Qx L Qx L The P and Q projectors x Px K Calais February 7, 25 7 Calais February 7, 25 8

Approximate problem amounts to solving Q(b Ax) =, x K or in operator form Q(b APx) = Question: what accuracy can one expect? Let x be the exact solution. Then 1) we cannot get better accuracy than (I P)x 2, i.e., Two Important Particular Cases. 1. L = AK. Then b A x 2 = min z K b Az 2 Class of Minimal Residual Methods: CR, GCR, ORTHOMIN, GMRES, CGNR,... 2. L = K Class of Galerkin or Orthogonal projection methods. When A is SPD then x x A = min z K x z A. x x 2 (I P)x 2 2) the residual of the exact solution for the approximate problem satisfies: b QAPx 2 QA(I P) 2 (I P)x 2 Calais February 7, 25 9 Calais February 7, 25 1 One-dimensional projection processes K = span{d} and L = span{e} Then x x + αd and Petrov-Galerkin condition r Aδ e yields Three popular choices: α = (r,e) (Ad,e) (I) Steepest descent. A is SPD. Take at each step d = r and e = r. Iteration: r b Ax, α (r, r)/(ar, r) x x + αr Each step minimizes f(x) = x x 2 A = (A(x x ), (x x )) in direction f. Convergence guaranteed if A is SPD. Calais February 7, 25 11 (II) Residual norm steepest descent. A is arbitrary (nonsingular). Take at each step d = A T r and e = Ad. Iteration: r b Ax, d = A T r α d 2 2 / Ad 2 2 x x + αd Each step minimizes f(x) = b Ax 2 2 in direction f. Important Note: equivalent to usual steepest descent applied to normal equations A T Ax = A T b. Converges under the condition that A is nonsingular. Calais February 7, 25 12

(III) Minimal residual iteration. A positive definite (A + A T is SPD). Take at each step d = r and e = Ar. KRYLOV SUBSPACE METHODS Iteration: r b Ax, α (Ar, r)/(ar, Ar) x x + αr Principle: Projection methods on Krylov subspaces, i.e., on K m (A, v 1 ) = span{v 1, Av 1,, A m 1 v 1 } Each step minimizes f(x) = b Ax 2 2 in direction r. Converges under the condition that A + A T is SPD. probably the most important class of iterative methods. many variants exist depending on the subspace L. v Simple properties of K m. Let µ = deg. of minimal polynomial of K m = {p(a)v p = polynomial of degree m 1} K m = K µ for all m µ. Moreover, K µ is invariant under A. Calais February 7, 25 13 dim(k m ) = m iff µ m. Calais February 7, 25 14 A little review: Gram-Schmidt process Goal: given X = [x 1,..., x m ] compute an orthonormal set Q = [q 1,..., q m ] which spans the same susbpace. ALGORITHM : 1 1. For j = 1,..., m Do: Classical Gram-Schmidt 2. Compute r ij = (x j, q i ) for i = 1,..., j 1 3. Compute ˆq j = x j j 1 i=1 r ij q i 4. r jj = ˆq j 2 If r jj == exit ALGORITHM : 2 Modified Gram-Schmidt 1. For j = 1,..., m Do: 2. ˆq j := x j 3. For i = 1,..., j 1 Do 4. r ij = (ˆq j, q i ) 5. ˆq j := ˆq j r ij q i 6. EndDo 7. r jj = ˆq j 2. If r jj == exit 8. q j := ˆq j /r jj 9. EndDo 5. q j = ˆq j /r jj 6. EndDo Calais February 7, 25 15 Calais February 7, 25 16

Let: X = [x 1,..., x m ] (n m matrix) Q = [q 1,..., q m ] (n m matrix) R = {r ij } (m m upper triangular matrix) At each step, ARNOLDI S ALGORITHM Goal: to compute an orthogonal basis of K m. Input: Initial vector v 1, with v 1 2 = 1 and m. For j = 1,..., m do Result: x j = j i=1 X = QR r ij q i Compute w := Av j for i = 1,..., j, do h i,j := (w, v i ) w := w h i,j v i h j+1,j = w 2 and v j+1 = w/h j+1,j Calais February 7, 25 17 Calais February 7, 25 18 Arnoldi s Method (L m = K m ) From Petrov-Galerkin condition when L m = K m, we get Vm Hm = O x m = x + V m H 1 m V T m r Result of orthogonalization process (Arnoldi s algorithm:) 1. V m = [v 1, v 2,..., v m ] orthonormal basis of K m. 2. AV m = V m+1 H m 3. Vm T AV m = H m H m last row. If, in addition we choose v 1 = r / r 2 r /β in Arnoldi s algorithm, then x m = x + βv m H 1 m e 1 Several algorithms mathematically equivalent to this approach: * FOM [Saad, 1981] (above formulation) * Young and Jea s ORTHORES [1982]. * Axelsson s projection method [1981]. Calais February 7, 25 2

Minimal residual methods (L m = AK m ) Restarting and Truncating When L m = AK m, we let W m AV m and obtain relation x m = x +V m [W T m AV m] 1 W T m r = x +V m [(AV m ) T AV m ] 1 (AV m ) T r. Use again v 1 := r /(β := r 2 ) and the relation AV m = V m+1 H m: x m = x + V m [ H T H m m ] 1 H T m βe 1 = x + V m y m where y m minimizes βe 1 H m y 2 over y R m. Therefore, (Generalized Minimal Residual method (GMRES) [Saad-Schultz, 1983]): x m = x + V m y m where y m : min y βe 1 H m y 2 Axelsson s CGLS Orthomin (198) Equivalent methods: Orthodir GCR Calais February 7, 25 21 Difficulty: As m increases, storage and work per step increase fast. First remedy: Restarting. Fix the dimension m of the subspace ALGORITHM : 3 Restarted GMRES (resp. Arnoldi) 1. Start/Restart: Compute r = b Ax, and v 1 = r /(β := r 2 ). 2. Arnoldi Process: generate H m and V m. 3. Compute y m = H 1 m βe 1 (FOM), or 4. x m = x + V m y m y m = argmin βe 1 H m y 2 (GMRES) 5. If r m 2 ɛ r 2 stop else set x := x m and go to 1. Calais February 7, 25 22 Second remedy: Truncate the orthogonalization The direct version of IOM [DIOM]: The formula for v j+1 is replaced by h j+1,j v j+1 = Av j j i=j k+1 h ij v i Writing the LU decomposition of H m as H m = L m U m we get x m = x + V m U 1 m L 1 m βe 1 Structure of L m, U m when k = 3 x + P m z m each v j is made orthogonal to the previous k v i s. x m still computed as x m = x + V m H 1 m βe 1. It can be shown that this is again an oblique projection process. IOM (Incomplete Orthogonalization Method) = replace orthogonalization in FOM, by the above truncated (or incomplete ) orthogonalization. Calais February 7, 25 23 L m = 1 x 1 x 1 x 1 x 1 x 1 x 1 U m = x x x x x x x x x x x x x x x x x x

p m = u 1 mm [v m m 1 i=m k+1 u imp i ] z m = z m 1 ζ m Can update x m at each step: x m = x m 1 + ζ m p m Note: Several existing pairs of methods have a similar link: they are based on the LU, or other, factorizations of the H m matrix CG-like formulation of IOM called DIOM [Saad, 1982] ORTHORES(k) [Young & Jea 82] equivalent to DIOM(k) SYMMLQ [Paige and Saunders, 77] uses LQ factorization of H m. Can incorporate partial pivoting in LU factorization of H m Calais February 7, 25 25 Some implementation details: GMRES Issue 1 : how to solve least-squares problem? Issue 2: How to compute residual norm (without computing solution at each step)? Several solutions to both issues. Simplest: use Givens rotations. Recall: we want to solve least-squares problem min y βe 1 H m y 2 Transform the problem into upper triangular one. Calais February 7, 25 26 Rotation matrices of dimension m + 1. Define (with s 2 i + c2 i = 1): Ω i = 1... 1 c i s i s i c i 1... 1 row i row i + 1 Multiply H m and right-hand side ḡ βe 1 by a sequence of such matrices from the left. s i, c i selected to eliminate h i+1,i Calais February 7, 25 27 H 5 = h 11 h 12 h 13 h 14 h 15 h 21 h 22 h 23 h 24 h 25 h 32 h 33 h 34 h 35 h 43 h 44 h 45 h 54 h 55 h 65, ḡ = β. 1-st Rotation Ω 1 = c 1 s 1 s 1 c 1 1 1 1 with s 1 = h 21 h 2 11 + h 2 21, c 1 = h 11 h 2 11 + h 2 21 Calais February 7, 25 28

H (1) m = h (1) 11 h (1) 12 h (1) 13 h (1) h (1) 22 h (1) 23 h (1) 14 h (1) 15 24 h (1) 25 h 32 h 33 h 34 h 35 repeat with Ω 2,..., Ω i. Result: H (5) 5 = h 43 h 44 h 45 h 54 h 55 h (5) 11 h (5) 12 h (5) 13 h (5) h (5) 22 h (5) 23 h (5) h (5) 33 h (5) h 65 14 h (5) 15 24 h (5) 25 34 h (5) 35 h (5) 44 h (5) 45 h (5) 55, ḡ 1 =, ḡ 5 = c 1 β s 1 β γ 1 γ 2 γ 3.. γ 6.. Define Q m = Ω m Ω m 1... Ω 1 R m = Since Q m is unitary, H (m) m = Q m H m, ḡ m = Q m (βe 1 ) = (γ 1,..., γ m+1 ) T. min βe 1 H m y 2 = min ḡ m R m y 2. Delete last row and solve resulting triangular system. R m y m = g m Calais February 7, 25 3 PROPOSITION: 1. The rank of AV m is equal to the rank of R m. In particular, if r mm = then A must be singular. 2. The vector y m which minimizes βe 1 H m y 2 is given by y m = R 1 m g m. 3. The residual vector at step m satisfies b Ax m = V m+1 ( βe1 H m y m ) = Vm+1 Q T m (γ m+1e m+1 ) and, as a result, b Ax m 2 = γ m+1. THE SYMMETRIC CASE: Observation Observe: When A is real symmetric then in Arnoldi s method: H m = Vm T AV m must be symmetric. Therefore Theorem. When Arnoldi s algorithm is applied to a (real) symmetric matrix then the matrix H m is symmetric tridiagonal: h ij = 1 i < j 1; and h j,j+1 = h j+1,j, j = 1,..., m Calais February 7, 25 31 Calais February 7, 25 32

We can write H m = α 1 β 2 β 2 α 2 β 3 β 3 α 3 β 4...... β m α m The v i s satisfy a three-term recurrence [Lanczos Algorithm]: β j+1 v j+1 = Av j α j v j β j v j 1 simplified version of Arnoldi s algorithm for sym. systems. Symmetric matrix + Arnoldi Symmetric Lanczos (1) The Lanczos algorithm ALGORITHM : 4 Lanczos 1. Choose an initial vector v 1 of norm unity. Set β 1, v 2. For j = 1, 2,..., m Do: 3. w j := Av j β j v j 1 4. α j := (w j, v j ) 5. w j := w j α j v j 6. β j+1 := w j 2. If β j+1 = then Stop 7. v j+1 := w j /β j+1 8. EndDo Calais February 7, 25 33 Calais February 7, 25 34 Lanczos algorithm for linear systems Usual orthogonal projection method setting: L m = K m = span{r, Ar,..., A m 1 r } Basis V m = [v 1,..., v m ] of K m generated by the Lanczos algorithm Three different possible implementations. (1) Arnoldi-like; (2) Exploit tridigonal nature of H m (DIOM); (3) Conjugate gradient. ALGORITHM : 5 Lanczos Method for Linear Systems 1. Compute r = b Ax, β := r 2, and v 1 := r /β 2. For j = 1, 2,..., m Do: 3. w j = Av j β j v j 1 (If j = 1 set β 1 v ) 4. α j = (w j, v j ) 5. w j := w j α j v j 6. β j+1 = w j 2. If β j+1 = set m := j and go to 9 7. v j+1 = w j /β j+1 8. EndDo 9. Set T m = tridiag(β i, α i, β i+1 ), and V m = [v 1,..., v m ]. 1. Compute y m = T 1 m (βe 1) and x m = x + V m y m Calais February 7, 25 35 Calais February 7, 25 36

ALGORITHM : 6 D-Lanczos 1. Compute r = b Ax, ζ 1 := β := r 2, and v 1 := r /β 2. Set λ 1 = β 1 =, p = 3. For m = 1, 2,..., until convergence Do: 4. Compute w := Av m β m v m 1 and α m = (w, v m ) 5. If m > 1 then compute λ m = β m η m 1 and ζ m = λ m ζ m 1 The Conjugate Gradient Algorithm (A S.P.D.) Note: the p i s are A-orthogonal The r i s are orthogonal. And we have x m = x m 1 + ξ m p m 6. η m = α m λ m β m 7. p m = η 1 m (v m β m p m 1 ) 8. x m = x m 1 + ζ m p m 9. If x m has converged then Stop 1. w := w α m v m 11. β m+1 = w 2, v m+1 = w/β m+1 12. EndDo So there must be an update of the form: 1. p m = r m 1 + β m p m 1 2. x m = x m 1 + ξ m p m 3. r m = r m 1 ξ m Ap m Calais February 7, 25 37 Calais February 7, 25 38 The Conjugate Gradient Algorithm (A S.P.D.) 1. Start: r := b Ax, p := r. 2. Iterate: Until convergence do, (a) α j := (r j, r j )/(Ap j, p j ) METHODS BASED ON LANCZOS BIORTHOGONALIZATION (b) x j+1 := x j + α j p j (c) r j+1 := r j α j Ap j (d) β j := (r j+1, r j+1 )/(r j, r j ) (e) p j+1 := r j+1 + β j p j r j = scaling v j+1. The r j s are orthogonal. The p j s are A-conjugate, i.e., (Ap i, p j ) = for i j. Calais February 7, 25 39

ALGORITHM : 7 The Lanczos Bi-Orthogonalization Procedure 1. Choose two vectors v 1, w 1 such that (v 1, w 1 ) = 1. 2. Set β 1 = δ 1, w = v 3. For j = 1, 2,..., m Do: 4. α j = (Av j, w j ) 5. ˆv j+1 = Av j α j v j β j v j 1 6. ŵ j+1 = A T w j α j w j δ j w j 1 7. δ j+1 = (ˆv j+1, ŵ j+1 ) 1/2. If δ j+1 = Stop 8. β j+1 = (ˆv j+1, ŵ j+1 )/δ j+1 9. w j+1 = ŵ j+1 /β j+1 1. v j+1 = ˆv j+1 /δ j+1 11. EndDo Extension of the symmetric Lanczos algorithm Builds a pair of biorthogonal bases for the two subspaces K m (A, v 1 ) and K m (A T, w 1 ) Different ways to choose δ j+1, β j+1 in lines 7 and 8. Let T m = α 1 β 2 δ 2 α 2 β 3... δ m 1 α m 1 β m v i K m (A, v 1 ) and w j K m (A T, w 1 ). δ m α m. Calais February 7, 25 41 If the algorithm does not break down before step m, then the vectors v i, i = 1,..., m, and w j, j = 1,..., m, are biorthogonal, i.e., (v j, w i ) = δ ij 1 i, j m. Moreover, {v i } i=1,2,...,m is a basis of K m (A, v 1 ) and {w i } i=1,2,...,m is a basis of K m (A T, w 1 ) and AV m = V m T m + δ m+1 v m+1 e T m, A T W m = W m T T m + β m+1w m+1 e T m, W T m AV m = T m. The Lanczos Algorithm for Linear Systems ALGORITHM : 8 Lanczos Algorithm for Linear Systems 1. Compute r = b Ax and β := r 2 2. Run m steps of the nonsymmetric Lanczos Algorithm i.e., 3. Start with v 1 := r /β, and any w 1 such that (v 1, w 1 ) = 1 4. Generate the Lanczos vectors v 1,..., v m, w 1,..., w m 5. and the tridiagonal matrix T m from Algorithm??. 6. Compute y m = T 1 m (βe 1) and x m := x + V m y m. BCG can be derived from the Lanczos Algorithm similarly to CG in symmetric case. Calais February 7, 25 44

The BCG and QMR Algorithms Let T m = L m U m ( LU factorization of T m ). Define P m = V m U 1 m Then, solution is x m = x + V m T 1 m (βe 1) = x + V m U 1 m L 1 m (βe 1) = x + P m L 1 m (βe 1) x m is updatable from x m 1 similar to the CG algorithm. r j and r j are in the same direction as v j+1 and w j+1 respectively. they form a biorthogonal sequence. The p i s p i s are are A-conjugate. Utilizing this information, a CG-like algorithm can be easily derived from the Lanczos procedure. ALGORITHM : 9 BiConjugate Gradient (BCG) 1. Compute r := b Ax. Choose r such that (r, r ). 2. Set, p := r, p := r 3. For j =, 1,..., until convergence Do:, 4. α j := (r j, rj )/(Ap j, p j ) 5. x j+1 := x j + α j p j 6. r j+1 := r j α j Ap j 7. rj+1 := r j α ja T p j 8. β j := (r j+1, rj+1 )/(r j, rj ) 9. p j+1 := r j+1 + β j p j 1. p j+1 := r j+1 + β jp j 11. EndDo Calais February 7, 25 45 Calais February 7, 25 46 Quasi-Minimal Residual Algorithm The Lanczos algorithm gives the relations AV m = V m+1 T m with T m = (m + 1) m tridiagonal matrix T m = T m δ m+1 e T m Let v 1 βr and x = x + V m y. Residual norm b Ax 2 is ( r AV m y 2 = βv 1 V m+1 T m y 2 = V m+1 βe1 T m y ) 2 Column-vectors of V m+1 are not orthonormal ( GMRES). But: reasonable idea to minimize the function J(y) βe 1 T m y 2 Quasi-Minimal Residual Algorithm (Freund, 199).. ALGORITHM : 1 QMR 1. Compute r = b Ax and γ := r 2, w 1 := v 1 := r /γ 1 2. For m = 1, 2,..., until convergence Do: 3. Compute α m, δ m+1 and v m+1, w m+1 as in Lanczos Algor. [alg.??] 4. Update the QR factorization of T m, i.e., 5. Apply Ω i, i = m 2, m 1 to the m-th column of T m 6. Compute the rotation coefficients c m, s m 7. Apply rotation Ω m, to T m and ḡ m, i.e., compute: 8. γ m+1 := s m γ m ; γ m := c m γ m ; and α m := c m α m + s m δ m+1 9. p m = ( v m m 1 i=m 2 t imp i ) /tmm 1. x m = x m 1 + γ m p m 11. If γ m+1 is small enough Stop 12. EndDo Calais February 7, 25 47 Calais February 7, 25 48

Transpose-Free Variants Conjugate Gradient Squared BCG and QMR require a matrix-by-vector product with A and A T at each step. The products with A T do not contribute directly to x m. They allow to determine the scalars (α j and β j in BCG). QUESTION: is it possible to bypass the use of A T? Motivation: in nonlinear equations, A is often not available explicitly but via the Frechet derivative: J(u k )v = F (u k + ɛv) F (u k ) ɛ. * Clever variant of BCG which avoids using A T [Sonneveld, 1984]. In BCG: r i = ρ i (A)r where ρ i = polynomial of degree i. In CGS: r i = ρ 2 i (A)r Calais February 7, 25 49 Calais February 7, 25 5 Define r j = φ j (A)r, p j = π j (A)r, r j = φ j(a T )r, p j = π j (A T )r. Scalar α j in BCG is given by α j = (φ j(a)r, φ j (A T )r ) (Aπ j (A)r, π j (A T )r ) = (φ2 j (A)r, r ) (Aπ 2 j (A)r, r ) Possible to get a recursion for the φ 2 j (A)r and π 2 j (A)r? Square the equalities φ j+1 (t) = φ j (t) α j tπ j (t), π j+1 (t) = φ j+1 (t) + β j π j (t) φ 2 j+1 (t) = φ2 j (t) 2α jtπ j (t)φ j (t) + α 2 j t2 π 2 j (t), Solution: Let φ j+1 (t)π j (t), be a third member of the recurrence. For π j (t)φ j (t), note: φ j (t)π j (t) = φ j (t) (φ j (t) + β j 1 π j 1 (t)) = φ 2 j (t)+β j 1φ j (t)π j 1 (t). Result: φ 2 j+1 = φ2 j α jt ( 2φ 2 j + 2β j 1φ j π j 1 α j t π 2 j φ j+1 π j = φ 2 j + β j 1φ j π j 1 α j t π 2 j π 2 j+1 = φ2 j+1 + 2β jφ j+1 π j + β 2 j π2 j. Define: r j = φ 2 j (A)r, p j = πj 2(A)r, q j = φ j+1 (A)π j (A)r ) π 2 j+1 (t) = φ2 j+1 (t) + 2β jφ j+1 (t)π j (t) + β 2 j π j(t) 2. Problem: Cross terms Calais February 7, 25 51 Calais February 7, 25 52

Recurrences become: r j+1 = r j α j A (2r j + 2β j 1 q j 1 α j A p j ), q j = r j + β j 1 q j 1 α j A p j, p j+1 = r j+1 + 2β j q j + β 2 j p j. Define auxiliary vector d j = 2r j + 2β j 1 q j 1 α j Ap j one more auxiliary vector, u j = r j + β j 1 q j 1. So d j = u j + q j, q j = u j α j Ap j, p j+1 = u j+1 + β j (q j + β j p j ), vector d j is no longer needed. Sequence of operations to compute the approximate solution, starting with r := b Ax, p := r, q :=, β :=. 1. α j = (r j, r )/(Ap j, r ) 2. d j = 2r j + 2β j 1 q j 1 α j Ap j 3. q j = r j + β j 1 q j 1 α j Ap j 5. r j+1 = r j α j Ad j 6. β j = (r j+1, r )/(r j, r ) 7. p j+1 = r j+1 + β j (2q j + β j p j ). 4. x j+1 = x j + α j d j Calais February 7, 25 53 Calais February 7, 25 54 ALGORITHM : 11 Conjugate Gradient Squared 1. Compute r := b Ax ; r arbitrary. 2. Set p := u := r. 3. For j =, 1, 2..., until convergence Do: 4. α j = (r j, r )/(Ap j, r ) 5. q j = u j α j Ap j 6. x j+1 = x j + α j (u j + q j ) 7. r j+1 = r j α j A(u j + q j ) 8. β j = (r j+1, r )/(r j, r ) 9. u j+1 = r j+1 + β j q j 1. p j+1 = u j+1 + β j (q j + β j p j ) 11. EndDo Note: no matrix-by-vector products with A T but two matrix-byvector products with A, at each step. Vector: Polynomial in BCG : q i r i (t) p i 1 (t) u i p 2 i (t) r i r 2 i (t) where r i (t) = residual polynomial at step i for BCG,.i.e., r i = r i (A)r, and p i (t) = conjugate direction polynomial at step i, i.e., p i = p i (A)r.

BCGSTAB (van der Vorst, 1992) In CGS: residual polynomial of BCG is squared. bad behavior in case of irregular convergence. Bi-Conjugate Gradient Stabilized (BCGSTAB) = a variation of CGS which avoids this difficulty. Derivation similar to CGS. Residuals in BCGSTAB are of the form, r j = ψ j(a)φ j (A)r in which, φ j (t) = BCG residual polynomial, and.... ψ j (t) = a new polynomial defined recursively as ψ j+1 (t) = (1 ω j t)ψ j (t) ω i chosen to smooth convergence [steepest descent step] ALGORITHM : 12 BCGSTAB 1. Compute r := b Ax ; r arbitrary; 2. p := r. 3. For j =, 1,..., until convergence Do: 4. α j := (r j, r )/(Ap j, r ) 5. s j := r j α j Ap j 6. ω j := (As j, s j )/(As j, As j ) 7. x j+1 := x j + α j p j + ω j s j 8. r j+1 := s j ω j As j 9. β j := (r j+1,r ) (r j,r ) α j ω j 1. p j+1 := r j+1 + β j (p j ω j Ap j ) 11. EndDo Calais February 7, 25 57 Calais February 7, 25 58 Convergence Theory for CG Approximation of the form x = x + p m 1 (A)r. with x = initial guess, r = b Ax ; THEORY Optimality property: x m minimizes x x A over x + K m Consequence: Standard result Let x m = m-th CG iterate, x = exact solution and λ min η = λ max λ min Then: x x m A x x A T m (1 + 2η) where T m = Chebyshev polynomial of degree m. Calais February 7, 25 6

THEORY FOR NONHERMITIAN CASE Convergence results for nonsymmetric case Much more difficult! No convincing results on global convergence for many algorithms (bi-cg, FOM, etc..) Can get a general a-priori a-posteriori error bound Methods based on minimum residual better understood. If (A + A T ) is positive definite ((Ax, x) > x ), all minimum residual-type methods (ORTHOMIN, ORTHODIR, GCR, GMRES,...), + their restarted and truncated versions, converge. Convergence results based on comparison with steepest descent [Eisenstat, Elman, Schultz 1982] not sharp. Minimum residual methods: if A = XΛX 1, Λ diagonal, then b Ax m 2 Cond 2 (X) min p Pm 1,p()=1 max λ Λ(A) p(λ) Calais February 7, 25 61 ( P m 1 set of polynomials of degree m 1, Λ(A) spectrum of A) Calais February 7, 25 62 Two useful projectors The approximate problem in terms of P and Q Let P be the orthogonal projector onto K and Q be the (oblique) projector onto K and orthogonally to L. Approximate problem amounts to solving Q(b Ax) =, x K x or in operator form Px K, x Px K Qx K, x Qx L Qx L Px K Q(b APx) = Question: what accuracy can one expect? If x is the exact solution, then we cannot get better accuracy than (I P)x 2, i.e., x x 2 (I P)x 2 Calais February 7, 25 63

THEOREM. Let γ = QA(I P) 2 and assume that b belongs to K. Then the residual norm of the exact solution x for the (approximate) linear operator A m satisfies the inequality, b A m x 2 γ (I P)x 2 In other words if approximate problem is not poorly conditioned and if (I P)x 2 is small then we will obtain a good approximate solution. Methods based on the normal equations It is possible to obtain the solution of Ax = b from the equivalent system: A T Ax = A T b or AA T y = b, x = A T y Methods based on these approaches are usually slower than previous ones. (Condition number of system is squared) Exception: when A is strongly indefinite (extreme case: A is orthogonal, A T A = I convergence in 1 step). Calais February 7, 25 65 Calais February 7, 25 66 CGNR and CGNE Can use CG to solve normal equations. Two well-known options. (1) CGNR: Conjugate Gradient method on A T Ax = A T b (2) CGNE: Let x = A T y and use conjugate gradient method on AA T y = b Different optimality properties Various efficient formulations in both cases ALGORITHM : 13 CGNR 1. Compute r = b Ax, z = A T r, p = z. 2. For i =,..., until convergence Do: 3. w i = Ap i 4. α i = z i 2 / w i 2 2 5. x i+1 = x i + α i p i 6. r i+1 = r i α i w i 7. z i+1 = A T r i+1 8. β i = z i+1 2 2 / z i 2 2, 9. p i+1 = z i+1 + β i p i 1. EndDo Calais February 7, 25 67 Calais February 7, 25 68

CGNR: The approximation x m minimizes the residual norm b Ax 2 over the affine Krylov subspace, x + span{a T r, (A T A)A T r,..., (A T A) m 1 A T r }, where r b Ax. The difference with GMRES is the subspace in which the residual norm is minimized. For GMRES the subspace is x + K m (A, r ). ALGORITHM : 14 CGNE (Craig s Method) 1. Compute r = b Ax, p = A T r. 2. For i =, 1,..., until convergence Do: 3. α i = (r i, r i )/(p i, p i ) 4. x i+1 = x i + α i p i 5. r i+1 = r i α i Ap i 6. β i = (r i+1, r i+1 )/(r i, r i ) 7. p i+1 = A T r i+1 + β i p i 8. EndDo Calais February 7, 25 69 Calais February 7, 25 7 CGNE produces the approximate solution x in the subspace x + A T K m (AA T, r ) = x + K m (A T A, A T r ) which minimizes x x, where x = A 1 b, r = b Ax. Note: Same subspace as CGNR! Block GMRES and Block Krylov Methods Main Motivation: To solve linear systems with several righthand sides Ax (i) = b (i), i = 1,..., p or, in matrix form, AX = B Let Sometimes Block methods are used as a strategy for enhancing convergence even for the case p = 1. R [r (1), r(2),..., r(p) ]. each column is r (i) = b (i) Ax (i). Calais February 7, 25 71 Krylov methods find an approximation to X from the subspace

K m (A, R ) = span{r, AR,..., A m 1 R } For example Block-GMRES (BGMRES) nds X to minimize B AX F for X X + K m (A, R ) Various implementations of BGMRES exist Simplest one is based on Ruhe s variant of the Block Arnoldi procedure. ALGORITHM : 15 Block Arnoldi Ruhe s variant 1. Choose p initial orthonormal vectors {v i } i=1,...,p. 2. For j = p, p + 1,..., m Do: 3. Set k := j p + 1; 4. Compute w := Av k ; 5. For i = 1, 2,..., j Do: 6. h i,k := (w, v i ) 7. w := w h i,k v i 8. EndDo 9. Compute h j+1,k := w 2 and v j+1 := w/h j+1,k. 1. EndDo p = 1 coincides with standard Arnoldi process. Interesting feature: dimension of the subspace need not be a multiple of the block-size p. At the end of the algorithm, we have the relation AV m = V m+p H m. The matrix H m is now of size (m + p) m. Each approximate solution has the form x (i) = x (i) + V my (i), where y (i) must minimize the norm b (i) Ax (i) 2. Plane rotations can be used for this purpose as in the standard GMRES [p rotations are needed for each step.] Calais February 7, 25 75