Iterative Methods for Linear Systems 1. Introduction: Direct solvers versus iterative solvers In many applications we have to solve a linear system Ax = b with A R n n and b R n given. If n is large the solution of the linear system takes a lot of operations, and standard Gaussian elimination may take too long. But in many cases most entries of the matrix A are zero and A is a so-called sparse matrix. This means each equation only couples very few of the n unknowns x 1,..., x n. A typical example are discretizations of partial differential equations, see next section for an example. Direct solvers will give the exact solution after finitely many operations (if we ignore roundoff errors). Gaussian elimination with partial pivoting: This gives a decomposition LU = lower triangular, U is upper triangular. row p 1 of A. row p n of A where L is Cholesky decomposition: We need that A is symmetric positive definite. This gives a decomposition A = LL where L is lower triangular. Cholesky decomposition takes about half the number of operations of Gaussian elimination. Cost for Gaussian elimination and Cholesky algorithm: for full matrices finding the decomposition takes Cn 3 operations. Once have the decomposition solving the linear system for a given vector b takes n operations. for band matrices with bandwidth m, i.e., A ij = 0 for i j > m: Finding the decomposition takes Cm n operations, solving a linear system then takes Cmn operations. In Matlab we should initialize the matrix A as a sparse matrix structure. Then Matlab will only use storage and operations to compute the nonzero elements of L, U. For a matrix A with bandwidth m the factors L, U will also have bandwidth m. For a general sparse matrix A, the factors L, U will usually have additional nonzero elements at locations where A had zero elements. This is called fill-in, and this increases the number of operations. Reordering: If you use the Matlab command x=a\b (where the matrix has sparse array type) then Matlab will try to renumber the unknowns in such a way that the amount of fill-in will minimized. This can substantially reduce the number of operations. The command spparms( spumoni,) makes Matlab print out details about the algorithms used for each following \ command. There are also versions of the lu and chol commands that use reordering to minimize fill-in. 1
. Convection diffusion problem in R d We consider a typical application problem which leads to a large sparse linear system. Equilibrium problems for elastic deformations or heat transfer lead to elliptic differential equations. Boundary value problem In the convection diffusion problem we have a domain R d. For d = 1 we consider an interval, for d = we consider a polygon, for d = 3 we consider a polyhedron. We want to find a function u(x) for x such that u + b u = f in, u = 0 on the boundary where b R n is a constant vector, and f is a given function on. This is called a boundary value problem. Variational formulation We first want to find the variational formulation : If we multiply the PDE by a test function v which is zero on the boundary and integrate over we obtain after using the first Green formula ( u)v dx + u v dx = ( nu)v ds [ u v + (b u)v] dx = fv dx } {{ } }{{ } a(u, v) l(v) We use the Hilbert space V = H0 {u 1() = ( } u + u ) dx <, u = 0 with the norm u V = ( u + u ) dx = u L () + u L (). Note that (b u)u dx = (b 1u x1 u + b u x u) dx = 0: For β b(x ) x =α x 1 =a(x ) b 1u x1 u dx 1 dx integration by parts gives for the inner integral b(x ) x 1 =a(x ) u x 1 u dx 1 = b(x ) x 1 =a(x ) u x 1 u dx 1 since u is zero on the boundary. We then obtain that a(, ): V V R is a bilinear form satisfying for all u, v V a(u, v) L a u V v V (1) a(u, u) γ a u V () The first inequality follows from the Cauchy-Schwartz inequality. For the second inequality we use a(u, v) = u u dx and the Poincare inequality u L () C u L () ). We obtain that l: V R is a linear functional such that l(v) C l v V. The variational formulation is: Find u V such that v V : a(u, v) = l(v) By the Lax-Milgram theorem (see Appendix A below) this variational problem has a unique solution u V.
Finite element discretization We choose a finite dimensional subspace V h V. For d = the domain is a polygon, and we divide it into a mesh of triangles. (For d = 1 we divide the interval into subintervals, for d = 3 we divide the polyhedron into a mesh of tetrahedra). Then we define V h as the space of piecewise linear functions on the mesh which are continuous in and are zero on the boundary. The discrete problem is: Find u h V h such that v h V h : a(u h, v h ) = l(v h ) (3) Since V h V the inequalities (1), () are satisfied for u, v V h. Hence by the Lax-Milgram theorem the discrete problem has a unique solution u h. We can specify a function v h V h by specifying the values v 1,..., v n at the interior nodes x 1,..., x n of the mesh. The basis function φ j is the function in V h with φ j (x j ) = 1 and φ(x k ) = 0 for k j. We can then write u h as u h = u 1 φ 1 + + u n φ n where u = u 1. u n R n is the coefficient vector. Now (3) for v h = φ 1,..., φ n gives the linear system Au = b, A jk = a(φ k, φ j ), b j = l(φ j ). Therefore the finite element method involves the following steps: pick a mesh on assemble the stiffness matrix A and the right hand side vector b solve the linear system Au = b Work for direct solvers For d = 1 we obtain a tridiagonal matrix A. Hence the work is proportional to N = 1/h. As a simple example for d = 3 consider the cube = (0, 1) 3. Let h = 1/N with positive integer N. By using uniform grids x 1 = j 1 /N, x = j /N, x 3 = j 3 /N with j 1, j, j 3 {0,..., N} for each coordinate we can subdivide into N 3 smaller cubes. We can then subdivide each of the smaller cubes into tetrahedra. We have n = (N 1) 3 interior nodes with j 1, j, j 3 {1,..., N 1}, and we can order them lexicographically by (j 1, j, j 3 ): (1, 1, 1),..., (1, 1, N 1), (1,, 1),..., (N, N, N). Then the resulting stiffness matrix A has size n n with n = (N 1) 3, and bandwidth (N 1). This will also hold for a more general domain R d, assuming that all triangles/tetrahedra are of size h, up to a constant : We will have n = dim V h ch d and a bandwidthm c h 1 d. Therefore the work for Gaussian elimination is with h cn 1 m n C ( N d 1) N d = CN 3d For d = we have therefore O(N 4 ) operations. For d = 3 we have O(N 7 ) operations. Using Matlab s reordering algorithms reduces the work to N 3 for d =, but for d = 3 it does not improve the rate O(N 7 ). Work of direct solvers for the convection-diffusion problem: d = 1 d = d = 3 Gaussian elimination (using band structure) N 1 N 4 N 7 Gaussian elimination with reordering N 1 N 3 N 7 3
Estimates (A u, v) L u v and (A u, u) γ u for the stiffness matrix A A function v h V h is given by a coefficient vector v = and v h L. How are these norms related to the norm v the coefficient vector v? v 1. v n. For the function v h we have the norms v h L We assume that all triangles are of size h, up to a constant. More precisely: We assume that a circle of radius c 0 h fits inside each triangle, and each triangle fits inside a circle of radius C 0 h. Then one can show that there exist constants c 1, c, c 3 depending on c 0 and C 0 such that c 1 h d/ v v h L c h d/ v v h L c 3 h 1 h d/ v This implies v h H 1 = v h L + v h L c h d v Hence we obtain for functions u h, v h V h with coefficient vectors u, v (A u, v) = a(u h, v h ) L a u h H 1 v h H 1 L a c h d u v (4) (A u, u) = a(u h, u h ) γ a u h L γ ac 1h d u (5) Hence we obtain ( ) L = A Ch d, γ = λ 1 min (A + A ) C h d We can split the bilinear form a(u, v) into a diffusion part and a convection part a(u, v) = u v dx + (b u)v dx, }{{}}{{} a diff (u, v) a conv (u, v) hence we have A = A diff + A conv with (A diff u, v) = u h v h dx u h L v h L c 3h d u v (A conv u, v) = u h (bv h ) dx u h L b v h L c 3 c b h d 1 u v 3. 1-step minimum residual method aka GMRES(1) We want to solve the linear system Au = b where A R n n and b R n. We assume that A is positive definite: (Au, u) γ u for all u R n The current guess is u (k). We compute the residual r (k) := b Au (k) and define the new guess as u (k+1) := u (k) + α k r (k) where we choose α k such that the new residual has r (k+1) := b Au (k+1) has minimum norm r (k+1), yielding α k := ( Ar (k), r (k)) Ar (k) 4
Note that each step requires one matrix-vector product Ar (k) (and a few inner products of vectors). If the matrix A satisfies (4), (5) we obtained earlier (see Appendix A below) r (k+1) ( 1 K 1) ( ) 1/ r (k) L with K = (6) γ implying r (k) ( 1 K 1) k/ r (0), Let κ := L/γ. Since A 1 γ 1 we have cond (A) κ: If the matrix A is symmetric we have cond (A) = κ: u (k) u γ 1 ( 1 K 1) k/ r (0) (7) cond (A) = A A 1 Lγ 1 = κ. L = A = λ max (A), γ = λ min (A), cond (A) = A A 1 = Lγ 1 = κ. Assume that we have r (k+1) q r (k) with q = 1 ε. In order to achieve r (k) δ we need to pick k such that r (k) q k r (0) δ, hence k log(δ/ r (0) ) log q For q = 1 ε the first order Taylor approximation gives log(1 ε) ε, hence we need approximatively ) steps for the iterative method. k ε 1 log ( r (0) δ = ε 1 C δ Here we have q = ( 1 K 1) 1/ 1 1 K 1 using Taylor. Hence we need steps for the iterative method. k C δ K = C δ κ Note that for the convection diffusion problem we have κ = Ch and q 1 ch 4. Therefore it would seem that we need Ch 4 steps of our iterative method. But it turns out that this estimate is too pessimistic. Actually we have q 1 ch and we need only Ch steps of our iterative method as we will see in the next section. 4. Sharper estimates for the convergence factor Symmetric case Recall that r (k+1) = (I αa)r (k) r (k) I αa. If A is symmetric, then also I αa is symmetric and we have with the eigenvalues λ 1,..., λ n of A I αa = max 1 αλ j (8) j=1,...,n If A is positive definite, the eigenvalues are positive, and we can minimize (8) by choosing α such that yielding α = λ max + λ min, 1 + αλ max = (1 αλ min ) I αa = λ max λ min λ max + λ min = 1 κ + 1 with κ = λ max λ min So the convergence factor is q = 1 κ + 1 and the number of iterations is proportional to κ = cond (A) (and not κ as the earlier estimate (6) would suggest). 5
Nonsymmetric case We can write A as a sum of a symmetric part H and antisymmetric part S: A = H + S, H := 1 (A + A ), S := 1 (A A ) We assume that A is positive definite, i.e., (Av, v) = (Hv, v) > 0 for v 0. Let u denote the current guess, and r := b Au the residual. The next approximation is u new = u + αr, with the residual r new = b Au new = (I αa)r. Hence r new = ((I αa)r, (I αa)r) = r α (Ar, r) + α Ar Note that both Ar and (Ar, r) 1/ = (Hr, r) 1/ define norms on R n. Therefore there exists C > 0 such that Then Ar C (Ar, r) for all r R d (9) r new r + [ α + Cα ] (Ar, r) The bound is minimal for α = C 1, and with this we get r new r C 1 (Ar, r) It remains to find C such that (9) holds: with v = Ar we get [ 1 γ ] r C (Ar, r) = (Hr, r) = ( HA 1 v, A 1 v ) Then we obtain (v, v) C(A} {{ HA 1 } v, v) with C = λ min (B) 1 = λ max (B 1 ) since B is symmetric. Hence we B need an estimate ( w, B 1 w ) C (w, w) with B 1 = AH 1 A : Using A = H S we get ( w, B 1 w ) = ( (H S)w, H 1 (H S)w ) = (Hw, w) (Sw, w) ( Hw, H 1 Sw ) + ( Sw, H 1 Sw ) }{{}}{{} 0 0 λ max (H) w + λ min (H) Sw Since (Sw, Sw) = ( S w, w) ρ(s) w we obtain Note: the eigenvalues of H are real and positive. C = λ max (H) + ρ(s) λ min (H) the eigenvalues of S are of the form ±α j i with α j 0. Proof: Let µ j denote the eigenvalues of S. The matrix S is symmetric and has real eigenvalues µ j 0 because ( S w, w ) = (Sw, Sw) 0. Since the matrix S is real, taking the complex conjugate of Sv = µ j v gives Sv = µ j v. Hence µ j is also an eigenvalue of S. Theorem 4.1. Let A R n n, let H := 1 (A + A ), S := 1 (A A ). If A is positive definite, i.e., λ min (H) > 0 the 1-step minimum residual iteration satisfies r (k+1) ( 1 K 1) 1/ r (k) K := cond (H) + ( ) ρ(s) (10) λ min (H) 6
( ) A Note: The number of iterations is proportional to K. In our earlier estimate (7) we had K = whereas γ we now obtain K = H ( ) S + γ γ This shows that for a symmetric matrix A = H the number of steps is proportional to the condition number. If we have nonsymmetric A = H +S then K increases by ( S /γ). So we see that the quadratic term ( A /γ) in our earlier estimate is actually only caused by the antisymmetric part S. Application to convection diffusion problem Recall that A = A diff + A conv with the symmetric matrix H = A diff and the antisymmetric matrix S = A conv and Therefore we have yielding with (10) K := cond (H) + C 1 h d u (A diffu, u) C h d u (A conv u, v) C 3 h d 1 u v λ min (H) C 1 h d, λ max (H) C h d, ρ(s) = S C 3 h d 1 ( ) ρ(s) C ( ) h C3 + h 1 = Ch, q = ( 1 C 1 h ) 1/ λ min (H) C 1 C 1 This means that we need C h steps of the iterative method to reduce the norm of the residual by a fixed factor. ( ) Note: C 3 is proportional to b, so we obtain K = C + C b h. So for a problem with strong convection the number of iterations can be very large. Recall that the stiffness matrix A is of size n n with cn nonzero elements where n ch d cn d (for the meshsize h 1/N). Therefore the work of a matrix-vector product is given by the number of nonzero matrix elements cn d. The work of one step of the 1-step min. res. method is one matrix-vector product, and some inner products, so the work per step is c N d. The number of steps is proportional to h N if we want to achieve a residual with r δ. Hence the total work for our iterative method with q = 1 ch is CN N d If we had an iterative method with q = 1 ch we would obtain a toal work of CNN d instead. Summary: work for solving the convection-diffusion problem d = 1 d = d = 3 Gaussian elimination (using band structure) N 1 N 4 N 7 Gaussian elimination with reordering N 1 N 3 N 7 1-step min. res. method, q = 1 ch N 3 N 4 N 5 iterative method with q = 1 ch N N 3 N 4 Note that for d = 1 using an iterative method is pointless. For d = the direct solver with reordering is better than the 1-step min. res. method. For d = 3 the iterative method is clearly better than the direct method. In the case of a symmetric matrix we can construct an iterative method with q = 1 ch. This is the conjugate gradient method which we will discuss next. 7
A. Solving F (u) = b using Richardson iteration or minimal residual method Lemma A.1. Let V be a Hilbert space. Assume that the function F : V V satisfies with constants L, γ > 0 for all u, v V F (v) F (u) L v u (11) F (v) F (u), v u γ v u (1) Then the equation F (u) = b with b V has a unique solution u V. The inverse mapping satisfies for b, c V F 1 (c) F 1 (b) γ 1 c b. (13) Proof. Consider the Richardson iteration u k+1 = G(u k ) with G(u) := u + α(b F (u)) with α > 0. We claim that G is a contraction if α is small: With e := v u we have G(v) G(u) = e α(f (v) F (u)) = e α F (v) F (u), e + α F (v) F (u) (1 αγ + α L ) e. For g(α) := 1 αγ+α L we have g(0) = 1 and g (0) < 0, so G is a contraction for sufficiently small α. It is easy to see that g(α) < 1 for α (0, γ/l ); we can minimize g(α) by choosing α = γ/l and obtain G(v) G(u) (1 γ /L ) 1/ v u. By the contraction mapping theorem the equation G(u) = u F (u) = b has a unique solution. We obtain (13) from γ v u F (v) F (u), v u F (v) F (u) v u. If we know some (possibly nonoptimal) constants L, γ satisfying (11), (1) the Richardson iteration u k+1 = u k + α (b F (u k )) can be used with α = γ/l to find an approximate solution of the nonlinear equation, and we have ) 1/ u k+1 u (1 γ L u k u. If we do not know the constants γ, L we can use a line search for α > 0 such that for u k+1 = u k + αr k the new residual r k+1 = b F (u k+1 ) has minimal norm, i.e., f k (α) := b F (u k + αr k ) becomes minimal: r k+1 = f k (α) = r k + F (u k ) F (u k + αr k ) = r k α 1 F (u k + αr k ) F (u k ), αr k + F (u k + αr k ) F (u k ) (1 αγ + αl ) r k This means that f k (α) < f k (0) for α (0, γ/l ), and f k (α) (1 γ /L )f k (0) for α = γ/l. Assume that our approximate line search yields a value α so that f k (α) < q f k (0) with some q < 1 independent of k, then r k = b F (u k ) q k/ r 0 0 as k Note that (13) implies and u k u γ 1 q k/ r 0 0 as k. u k u γ 1 F (u k ) b Corollary A.. Assume that f : [t 0, T ] R n R n satisfies with constants L > 0, L R for all t [t 0, T ], y, ỹ R n f(t, ỹ) f(t, y) L ỹ y f(t, ỹ) f(t, y), ỹ y L ỹ y Then for t [t 0, T ], y 0 R n the backward Euler equation y = y 0 + hf(t, y) has a unique solution y R n if hl < 1. In particular for L 0 (dissipative problem) there is a unique solution for any h > 0. 8
Proof. Define F : R n R n by F (y) = y hf(t, y). Then F (v) F (u), v u = (v u) h [f(t, v) f(t, u)], v u (1 hl) v u Note that the assertion is independent of L. Let y j, ỹ j be values at time t j. Then the values at time t j+1 = t j +h are given by y j+1 = F 1 (y j ), ỹ j+1 = F 1 (ỹ j ), and we obtain from (13) with γ = 1 hl ỹ j y j 1 1 hl ỹ j y j. Corollary A.3. Let V be a Hilbert space. Assume that the function F : V V satisfies with constants L, γ > 0 for all u, v V F (v) F (u) V L v u [F (v) F (u)] v u γ v u Then the equation F (u) = f with f V has a unique solution u V. Proof. By the Riesz representation theorem there is a linear isometry φ: V V such that f v = φf, v for all v V. Therefore we define F := φ F : V V and can apply the previous Lemma. Corollary A.4. (Lax-Milgram) Let V be a Hilbert space. Assume that the bilinear form a: V V R satisfies with constants L, γ > 0 for all u, v V a(u, v) L u v a(u, u) γ u Then the equation there is a unique u V which satisfies a(u, v) = f v for all v V and we have u γ 1 f V. Proof. Define F : V V by F (u) v := a(u, v). By the definition of V the function F satisfies the assumptions of Cor. 1. The estimate for u follows from γ u a(u, u) = f u f V u. Corollary A.5. Assume A R n n satisfies for all u R n Au, u γ u (14) Then the equation Au = b has for b R n a unique solution and we have A 1 γ 1. It can be found with the following iterative methods: 1. Richardson iteration: Let L := A, then for α (0, γ/l ) the iteration u k+1 = u k + α(b Au k ) (15) converges. In particular, for α = γ/l we have u k+1 u (1 γ /L ) 1/ u k u. A drawback of the method is that we need to know some (possibly nonoptimal) constants γ, L satisfying (5), A L in order to choose α (or we have to experiment with different values of α). 9
. 1-step minimum residual: We use (15) and choose α so that the norm of the new residual r k+1 = b Au k+1 becomes minimal: r k+1 r = k αar k r = k α Ar k, r k + α Ar k i.e., then α k := Ar k, r k Ar k r k+1 r = k Ar k, r k Ar k Therefore the residuals r k = A(u u k ) converge and r k (1 γ /L ) k/ r 0, ) r (1 k γ L ( ) k/ u k u γ 1 1 γ r 0 L This method corresponds to the first step of the GMRES method, or the GMRES(1) method which is restarted after every step. The full GMRES method minimizes the residuals over multiple directions, so the norm of the residual can only be lower. Hence the above estimates for r k and u k u also hold for the GMRES method. 10