Computational Linear Algebra PD Dr. rer. nat. habil. Ralf-Peter Mundani Computation in Engineering / BGU Scientific Computing in Computer Science / INF Winter Term 2018/19
Part 4: Iterative Methods PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 2
overview definitions splitting methods projection and KRYLOV subspace methods multigrid methods PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 3
basic concept we consider linear systems of type Ax = b (4.2.1) with regular matrix A and right-hand side b Definition 4.18 A projection method for solving (4.2.1) is a technique that computes approximate solutions x m x 0 + K m under consideration of (b Ax m ) L m, (4.2.2) where x 0 is arbitrary and K m and L m represent m-dimensional subspaces of. Here, orthogonality is defined via the EUCLIDEAN dot product x y (x, y) 2 = 0. PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 4
basic concept (cont d) observation in case K m = L m, the residual vector r m = b Ax m is perpendicular to K m we obtain an orthogonal projection method and (4.2.2) is called GALERKIN condition in case K m L m, we obtain a skew projection and (4.2.2) is called PETROV-GALERKIN condition splitting methods projection methods computation of approximated solutions x m Rn x m x 0 + K m Rn dim K m = m n computation method x m = Mx m 1 + Nb b Ax m L m Rn dim L m = m n PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 5
basic concept (cont d) Definition 4.19 A KRYLOV subspace method is a projection method for solving (4.2.1), where K m represents the KRYLOV subspace with r 0 = b Ax 0. K m = K m (A, r 0 ) = span {r 0, Ar 0,..., A m 1 r 0 } KRYLOV subspace methods are often described as reformulation of a linear system into a minimisation problem well-known methods are conjugate gradients (HESTENES & STIEFEL, 1952) and GMRES (SAAD & SCHULTZ, 1986) both methods compute the optimal approximation x m x 0 + K m w.r.t. (4.2.2) via incrementing the subspace dimension in every iteration by one neglecting round-off errors, both methods would compute the exact solution at latest after n iterations PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 6
overview CG method simultaneously consider: Ax = b and A T x = b BiCG method avoid multiplication with A T minimise: GMRES method combination of BiCG and GMRES CGS method avoid oscillations for residual QMR method avoid multiplication with A T BiCGSTAB method TFQMR method PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 7
method of steepest descent note: for further considerations, we assume the linear system (4.2.1) to exhibit a symmetric and positive definite (SPD) matrix we further consider functions F : x ½(Ax, x) 2 (b, x) 2 (4.2.3) and will first study some of their properties in order to derive the method Lemma 4.20 Let A be symmetric, positive definite and b given, then for a function F defined via (4.2.3) applies iff x ˆ = arg min F(x) Ax ˆ = b. PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 8
method of steepest descent (cont d) PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 9
method of steepest descent (cont d) idea: we want to achieve a successive minimisation of F based on point x along particular directions p hence, we define for x, p a function f x, p : λ f x, p (λ) := F(x + λp) Lemma and Definition 4.21 Let matrix A be symmetric, positive definite and vectors x, p with p 0 given, hence (r, p) λ opt = λ opt (x, p) := arg min f x, p (λ) = 2 λ (Ap, p) 2 applies with r := b Ax. Vector r is denoted as residual vector and its EUCLIDEAN norm r 2 as residual. PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 10
method of steepest descent (cont d) PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 11
method of steepest descent (cont d) with given sequence {p m } m of search directions out of \ {0}, we can determine a first method basic solver choose x 0 for m = 0, 1,... r m = b Ax m λ m = (r m, p m ) 2 (Ap m, p m ) 2 x m+1 = x m + λ m p m in order to complete our basic solver, we need a method to compute search directions p m PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 12
method of steepest descent (cont d) further (w/o loss of generality), we request p m 2 = 1 for x A 1 b we achieve a globally optimal choice via ˆx x p = with x ˆ = A 1 b, xˆ x 2 as hereby follows for definition of λ opt according to 4.21 x = x + λ opt p = x + xˆ x 2 = xˆ (b Ax, xˆ x) 2 xˆ x (b Ax, xˆ x) 2 xˆ x 2 however, this approach requires the knowledge of the exact solution xˆ for computing search directions PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 13
method of steepest descent (cont d) restricting to local optimality, search directions can be computed with the negative gradient of F here applies, hence F(x) = ½(A + A T )x b = Ax b = r p := yields the direction of steepest descent function F is due to 2 F(x) = A and SPD matrix A strictly convex it is obvious that x ˆ = A 1 b due to F(x) ˆ = 0 represents the only and global minimum of F PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 14 A sym. r for r 0 r 2 0 for r = 0 (4.2.4)
method of steepest descent (cont d) hence, we obtain the method of steepest decent (a.k.a. gradient method) choose x 0 for m = 0, 1,... r m = b Ax m Y r m 0 N λ m = r m 2 2 (Ar m, r m ) 2 λ m = 0 x m+1 = x m + λ m r m for practical applications, r 0 is computed outside the loop and inside with r m+1 = b Ax m+1 = b Ax m λ m Ar m = r m λ m Ar m one matrix-vector multiplication per iteration can be avoided PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 15
method of steepest descent (cont d) example: consider Ax = b with A =, b =, x 0 = thus, we get the following convergence m 0 10 40 70 72 x m,1 method of steepest descent x m,2 ε m := x m A 1 b A 4.000000e+00 1.341641e+00 7.071068e+00 3.271049e 02 1.097143e 02 5.782453e 02 1.788827e 08 5.999910e 09 3.162230e 08 9.782499e 15 3.281150e 15 1.729318e 14 3.740893e 15 1.254734e 15 6.613026e 15 PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 16
method of steepest descent (cont d) what s happening here? λ 1 = 2, λ 2 = 10 x 2 x 0 x 4 x 2 x 3 x 1 x 1 contour lines of F denote convergence process stretched ellipses due to different large values of diagonal entries of A residual vector always points into the direction of point of origin, but the approximated solution might change its sign in every single iteration motivates further considerations w.r.t. optimality of search directions PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 17
method of steepest descent (cont d) some thoughts about optimality Definition 4.22 Let F : be given, then x is called (a) optimal w.r.t. direction p if F(x) F(x + λp) for all λ applies, (b) optimal w.r.t. subspace U if F(x) F(x + ξ) for all ξ U applies. Lemma 4.23 Let F according to (4.2.3) be given, then x U if r = b Ax U applies. is optimal w.r.t. PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 18
method of steepest descent (cont d) observation: the gradient method represents in every step a projection method with K = L = span {r m 1 } obviously, optimality of the approximated solution concerning entire subspace U = span {r 0, r 1,..., r m 1 } would be preferable for linearly independent residual vectors hereby at the latest follows x n = A 1 b for the method of steepest descent all approximated solutions x m are optimal concerning r m 1 only due to missing transitivity of condition r p does not (necessarily) follow r m 2 r m from r m 2 r m 1 and r m 1 r m remedy: method of conjugated directions PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 19
method of conjugate directions idea: extend optimality of approximated solution x m to entire subspace U = span {p 0,..., p m 1 } with linearly independent search directions p i the following theorem formulates a condition for search directions that assures optimality w.r.t. U m in the (m+1)-st iteration step Theorem 4.24 Let F according to (4.2.3) be given and x be optimal w.r.t. subspace U = span {p 0,..., p m 1 }, then x = x + ξ is optimal w.r.t. U iff applies. Aξ U PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 20
method of conjugate directions (cont d) if for search direction p m either Ap m U m = span {p 0,..., p m 1 } or equivalent Ap m p j, j = 0,..., m 1 applies, then the approximated solution x m+1 = x m + λ m p m inherits according to 4.24 optimality from x m w.r.t. U m independent from the choice of scalar weighting factor λ m this degree of freedom λ m will be used further to extend optimality w.r.t. U m+1 = span {p 0,..., p m } PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 21
method of conjugate directions (cont d) Definition 4.25 Let A, then vectors p 0,..., p m are called pairwise conjugated or A-orthogonal if (p i, p j ) A := (Ap i, p j ) 2 = 0 i, j {0,..., m} and i j. Lemma 4.26 Let A be a symmetric and positive definite matrix and p 0,..., p m 1 \ {0} be pairwise A-orthogonal, then dim span {p 0,..., p m 1 }=m for m = 1,, n applies. PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 22
method of conjugate directions (cont d) (4.26) shows one important property of the method that lies within the successive dimensional increment of subspaces {U m } m = 0, 1, let with p 0,..., p m \ {0} pairwise conjugate search directions be given and x m be optimal w.r.t. U m = span {p 0,..., p m 1 }, thus we get optimality of w.r.t. U m+1 if x m+1 = x m + λp m 0 = (b Ax m+1, p j ) 2 = (b Ax m, p j ) 2 λ(ap m, p j ) 2 for j = 0,..., m applies = 0 for j m = 0 for j m PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 23
method of conjugate directions (cont d) for λ we yield the following representation and, thus, obtain the method of conjugate directions choose x 0 r 0 = b Ax 0 for m = 0, 1,..., n 1 λ m = (r m, p m ) 2 (Ap m, p m ) 2 x m+1 = x m + λ m p m r m+1 = r m λ m Ap m if search directions are chosen inappropriate, x n can yield the exact solution even x n 1 still has a large error for given search directions the method can only be used as direct one, which leads for large n to huge computational complexity hence, problem-oriented choice of search directions is inevitable PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 24
CG: method of conjugate gradients combination of methods of steepest descent and conjugate directions in order to obtain a problem-oriented approach w.r.t. selection of search directions and optimality w.r.t. orthogonality of search directions with residual vectors r 0,..., r m we successively determine search directions for m = 0,..., n 1 according to p 0 = r 0 p m = r m + α j p j (4.2.5) for α j = 0 (j = 0,..., m 1) we achieve an analogous selection of search directions according to method of steepest descent hence, under consideration of already used search directions p 0,..., p m 1 \ {0} exist m degrees of freedom in choosing α j to assure search directions to be conjugated PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 25
CG: method of conjugate gradients (cont d) from required A-orthogonality constraint using (4.2.5) follows 0 = (Ap m, p i ) 2 = (Ar m, p i ) 2 + α j (Ap j, p i ) 2 for i = 0,..., m 1 hence, with (Ap j, p i ) 2 = 0 for i, j {0,..., m 1} and i j we obtain the wanted algorithm to compute coefficients α i = (4.2.6) (Ar m, p i ) 2 (Ap i, p i ) 2 PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 26
CG: method of conjugate gradients (cont d) thus we obtain the preliminary method of conjugate gradients choose x 0 p 0 = r 0 = b Ax 0 for m = 0, 1,..., n 1 λ m = (r m, p m ) 2 (Ap m, p m ) 2 x m+1 = x m + λ m p m r m+1 = r m λ m Ap m p m+1 = r m+1 (Ar m+1, p j ) 2 (Ap j, p j ) 2 p j PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 27
CG: method of conjugate gradients (cont d) problem: for computation of p m+1 all p j (j = 0,..., m) are necessary due to p m+1 = r m+1 (Ar m+1, p j ) 2 (Ap j, p j ) 2 p j in case method does not stop before computation of p k for k > 0, then (a) p m is conjugated to all p j with 0 j < m k due to (4.2.5) and (4.2.6), (b) U m+1 = span {p 0,, p m }=span {r 0,, r m } with dim U m+1 = m + 1 for m = 0,, k 1, (c) r m U m for m = 1,, k, (d) x k = A 1 b r k = 0 p k = 0, (e) U m+1 = span {r 0,, A m r 0 } for m = 0,, k 1, (f) r m is conjugated to all p j with 0 j < m 1 < k 1. (Proof is lengthy ) PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 28
CG: method of conjugate gradients (cont d) w.r.t. (f): for 0 j < m 1 k 1 follows p j U m 1, hence Ap j U m applies and we get A symm. (Ar m, p j ) 2 = (r m, Ap j ) 2 = 0 from (f) follows (Ar m, p j ) 2 (Ar m, p m 1 ) 2 p m = r m p j = r m p (Ap m 1 (4.2.7) j, p j ) 2 (Ap m 1, p m 1 ) 2 (c) furthermore, the method can stop in the k+1-st iteration if p k = 0, i.e. according to (d) the solution x k = A 1 b has been found p k to be used as termination criteria in the final algorithm, as termination criteria w/o further computation the residual will be used PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 29
CG: method of conjugate gradients (cont d) from (c) for r m+1 = r m λ m Ap m follows (r m λ m Ap m, r m ) 2 = 0, hence (4.2.8) using (4.2.7) reveals thus with (4.2.8) follows (r m, r m ) 2 = (r m, p m ) 2 for λ m 0, from preliminary method follows Ap m = (r m+1 r m ), thus A sym. (b), (c) PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 30
CG: method of conjugate gradients (cont d) hence, only one matrix-vector multiplication per iteration necessary choose x 0 2 p 0 = r 0 = b Ax 0, α 0 = r 0 2 for m = 0, 1,..., n 1 Y α m 0 N v m = Ap m, λ m = α m (v m, p m ) 2 x m+1 = x m + λ m p m STOP r m+1 = r m λ m v m α r m+1 2 m+1 = 2 p m+1 = r m+1 + α m+1 α m p m PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 31
source: sciencecartoonsplus.com ( S. Harris) PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 32
preliminary consideration with {v 1,, v j } let some orthogonal basis of K j = span {r 0,, A j 1 r 0 } for j = 1,, m be given due to AK m = span {Ar 0,, A m r 0 } K m+1 the idea rises to write v m+1 as v m+1 = Av m + ξ with ξ span {v 1,, v m }=K m with ξ = follows (v m+1, v j ) 2 = (Av m, v j ) 2 α j (v j, v j ) 2 whereby due to orthogonality condition for j = 1,, m applies in case of normed base vectors, computation simplifies to α j = (Av m, v j ) 2 PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 33
preliminary consideration (cont d) for r 0 0 follows the ARNOLDI algorithm v 1 = (4.2.9) for j = 1,..., m for i = 1,..., j h ij = (v i, Av j ) 2 w j = Av j h j+1, j = Y v j+1 = h j+1, j 0 v j+1 = 0 STOP N PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 34
preliminary consideration (cont d) provided that ARNOLDI does not halt before computation of v m 0, then V j ={v 1,, v j } represents an orthonormal basis of the j-th KRYLOV subspace K j for j = 1,, m using V m = (v 1 v m ) we get with an upper HESSENBERG matrix, for which applies (H m ) ij = h ij from ARNOLDI algorithm for i j + 1 0 for i > j + 1 provided that ARNOLDI does not halt before computation of v m+1, then AV m = V m+1 H m applies with H m given by H m = H m 0 0 h m+1, m (4.2.10) PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 35
GMRES: generalised minimal residual in contrast to CG method, GMRES works for arbitrary regular matrices conforms to projection method with PETROV-GALERKIN condition L m = AK m we define function F : (4.2.11) Lemma 4.27 Let A be regular and b given, then for function F defined via (4.2.11) applies iff Ax ˆ = b. x ˆ = arg min F(x) PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 36
GMRES: generalised minimal residual (cont d) Lemma 4.28 Let F : according to (4.2.11) be given and x 0 be arbitrary. Then follows iff (4.2.12) applies. (Proof is very lengthy ) PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 37
GMRES: generalised minimal residual (cont d) GMRES based on ARNOLDI for computation of ONB {v 1,, v m } of K m let V m = (v 1 v m ), hence any x m x 0 + K m can be written as x m = x 0 + V m α m with α m with J m : the minimisation problem (4.2.12) is equivalent to α m = arg min J m (α) x m = x 0 + V m α m hence, two central objectives are to find a simple computation of α m and to compute α m only in case b Ax m 2 ε for given ε > 0 PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 38
GMRES: generalised minimal residual (cont d) with r 0 = b Ax 0 and e 1 = (1, 0,, 0) T follows J m (α) = b A(x 0 + V m α) 2 = r 0 AV m α 2 (4.2.9) = r 0 2 v 1 AV m α 2 (4.2.10) = r 0 2 v 1 V m+1 H m α 2 = V m+1 ( r 0 2 e 1 H m α) 2 where H m represents matrix (4.2.13) H m = H m 0 0 h m+1, m with right upper HESSENBERG matrix H m PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 39
GMRES: generalised minimal residual (cont d) advantage: due to structure of matrix H m computation of minimal error w/o explicit calculation of x m (i.e. computation only if b Ax m 2 ε) Lemma 4.29 Provided that the ARNOLDI algorithm does not terminate before computation of v m+1 and matrices G i+1, i for i = 1,, m via are given with c i and s i defined as PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 40
GMRES: generalised minimal residual (cont d) (Lemma 4.29 cont d) and with a = (G i, i 1 G 3,2 G 2,1 H m ) i, i and b = (G i, i 1 G 3,2 G 2,1 H m ) i+1, i, then Q m = G m+1, m G 2,1 is an orthogonal matrix for which Q m H m = R m with applies and R m being regular. (Proof is lengthy ) PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 41
GMRES: generalised minimal residual (cont d) with Q m and e 1 = (1, 0,, 0) T follows g m = r 0 2 Q m e 1 = (γ 1,, γ m, γ m+1 ) T = (gt m, γ m+1 ) T (4.2.14) hence, with (4.2.13) in case of v m+1 0 follows min J m (α) = min V m+1 ( r 0 2 e 1 H m α) 2 = min r 0 2 e 1 H m α 2 = min Q m ( r 0 2 e 1 H m α) 2 Lemma 4.29 = min g m R m α 2 = min due to regularity of R m follows min J m (α) = γ m+1 PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 42
GMRES: generalised minimal residual (cont d) some observations i. in case v m+1 = 0 follows min J m (α) = min V m ( r 0 2 e 1 H m α) 2 = min g m R m α 2 = 0 hence, in case min J m (α) = γ m+1 =0 the algorithm can terminate and the exact solution has been found ii. with γ 1,, γ m+1 according to (4.2.14) follows r j 2 = γ j+1 γ j = r j 1 2 for j = 1,, m finally, we get the GMRES algorithm PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 43
choose x 0 and compute r 0 = b Ax 0 v 1 =, γ 1 = r 0 2 for j = 1,, n h ij = (v i, Av j ) 2 for i = 1,, j GMRES algorithm c j γ j w j = Av j h ij v i, h j+1, j = w j 2 for i = 1,, j 1 β =, s j =, c j =, h jj = β γ j+1 = s j γ j, γ j = γj+1 = 0 Y for i = j,, 1 N STOP PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 44
overview definitions splitting methods projection and KRYLOV subspace methods multigrid methods PD Dr. Ralf-Peter Mundani Computational Linear Algebra Winter Term 2018/19 45