CME 302: NUMERICAL LINEAR ALGEBRA FALL 2005/06 LECTURE 9 GENE H GOLUB 1 Error Analysis of Gaussian Elimination In this section, we will consider the case of Gaussian elimination and perform a detailed error analysis, illustrating the analysis originally carried out by JH Wilkinson The process of solving Ax = b consists of three stages: (1) Factoring A = LU, resulting in an approximate LU decomposition A+E = LŪ We assume that partial pivoting is used (2) Solving Ly = b, or, numerically, computing y such that ( L + δ L)(y + δy) = b (3) Solving Ux = y, or, numerically, computing x such that Combining these stages, we see that (Ū + δū)(x + δx) = y + δy b = ( L + δ L)(Ū + δū)(x + δx) = ( LŪ + δ LŪ + LδŪ + δ LδŪ)(x + δx) = (A + E + δ LŪ + LδŪ + δ LδŪ)(x + δx) = (A + )(x + δx) where = E + δ LŪ + LδŪ + δ LδŪ In this analysis, we will view the computed solution x = x + δx as the exact solution to the perturbed problem (A + )x = b This perspective is the idea behind backward error analysis, which we will use to determine the size of the perturbation, and, eventually, arrive at a bound for the error in the computed solution x Let A (k) denote the matrix A after k 1 steps of Gaussian elimination have been performed in exact arithmetic, where a step denotes the process of making all elements below the diagonal within a particular column equal to zero Then the elements of A (k+1) are given by a (k+1) = a (k) m ik a (k), m ik = a(k) ik a (k) (11) Let B (k) denote the matrix A after k 1 steps of Gaussian elimination have been performed in floating-point arithmetic Then the elements of B (k+1) are given by ( ) b (k+1) = a (k) (k) b + ɛ(k+1) ik, s ik = fl b (k) (12) Date: November 25, 2005, version 11 Notes originally due to James Lambers Edited by Lek-Heng Lim 1
For j i, we have b (2) = b (1) s i1 b (1) 1j + ɛ(2) b (3) = b (2) s i2 b (2) 2j + ɛ(3) Combining these equations yields i Cancelling terms, we obtain where e := i For i > j, k=2 ɛ(k) k=2 b (i) = b(i 1) b (k) i 1 = b (1) = b (i) b (k) + i 1 s i,i 1 b (i 1) i 1,j + ɛ(i) i 1 i + b (2) = b (1) s i1 b (1) 1j + ɛ(2) b (j) = b (j 1) where s = fl(b (j) /b(j) jj ) = b(j) /b(j) jj + η, and therefore 0 = b (j) = b (j) = b (1) From (13) and (14), we obtain 1 LŪ = s 21 1 s n1 1 where Then, and so, b (k+1) s ik = fl ( ) (k) b ik b (k) k=2 ɛ (k) s ik b (k) + e, j i, (13) s 1 b (j 1) j 1,j + ɛ(j) s b (j) jj s b (j) jj j = b(k) ik + b(j) jj η + ɛ(j+1) s ik b (k) + e (14) b (1) 11 b (1) b (k) fl(s ik b (k) ) = s ikb (k) (1 + θ(k) = fl(b (k) = (b (k) 12 b (1) 1n = A + E b nn (n) (1 + η ik ), η ik u ), (1 + θ(k) )) (1 + θ(k) ))(1 + ϕ(k) θ(k) u ), ϕ(k) u 2
After some manipulations, we obtain ɛ (k+1) = b (k+1) ( (k) ) ϕ 1 + ϕ (k) θ(k) With partial pivoting, s ik 1, provided that fl(a/b) 1 whenever a b In most modern implementations of floating-point arithmetic, this is in fact the case It follows that ɛ (k+1) b (k+1) u 1 u + 1 b(k) u How large can the elements of B (k) be? Returning to exact arithmetic, we assume that a a and from (11), we obtain a (2) a(1) + a(1) 2a a (3) 4a a (n) = a(n) nn 2 n 1 a We can show that a similar result holds in floating-point arithmetic: b (k) 2k 1 a + O(u) This upper bound is achievable (by Hadamard matrices), but in practice it rarely occurs 2 Error in the LU Factorization Recall from last time that we were analyzing the error in solving Ax = b using backward error analysis, in which we assume that our computed solution x = x + δx is the exact solution to the perturbed problem (A + δa) x = b where δa is a perturbation that has the form δa = E + LδŪ + δ LŪ + δ LδŪ and the following relationships hold: (1) A + E = LŪ (2) ( L + δ L)(y + δy) = b (3) (Ū + δū)(x + δx) = y + δy We concluded that when partial pivoting is used, the entries of Ū were bounded: b (k) 2k 1 a + O(u) where k is the number of steps of Gaussian elimination that effect the element and a is an upper bound on the elements of A For complete pivoting, Wilkinson gave a bound, denoted G, or growth factor Until 1990, it was conjectured that G k It was shown to be true for n 5, but there have been examples constructed for n > 5 where G n In any event, we have the following bound for the entries of E: 0 0 1 1 1 2 2 E 2uGa + O(u 3 3 2 ) 1 2 3 n 1 n 1 3
3 Error Analysis of Forward Substitution We now study the process of forward substitution, to solve t 11 0 u 1 h 1 = t n1 Using forward substitution, we obtain which yields or u 1 = h 1 t 11 t nn u n h n u k = h k t k1 u 1 t k,k 1 u k 1 t fl(u k ) = h k(1 + ɛ k )(1 + η k ) k 1 i=1 t kiu i (1 + ξ ki )(1 + ɛ k )(1 + η k ) t = h k k 1 i=1 t kiu i (1 + ξ ki ) t (1 + ɛ k )(1 + η k ) k u i t ki (1 + λ ki ) = h k i=1 which can be rewritten in matrix notation as λ 11 t 11 T u + λ 12 t 12 λ 22 t 22 u = h In other words, the computed solution u is the exact solution to the perturbed problem (T +δt )u = h, where t 11 t 21 2 t 22 δt u + O(u2 ) (n 1) t n1 2 t nn Note that the perturbation δt actually depends on h 4 Bounding the perturbation in A Recall that our computed solution x + δx solves where δa is a perturbation that has the form (A + δa) x = b δa = E + LδŪ + δ LŪ + δ LδŪ For partial pivoting, l 1, and we have the bounds max δ L nu + O(u 2 ), max δū nuga + O(u 2 ) 4
were a = max a and G is the growth factor for partial pivoting Putting our bounds together, we have max from which it follows that δa max e + max LδŪ + max 2uGan + n 2 Gau + n 2 Gau + O(u 2 ) δa 2n 2 (n + 1)uGa + O(u 2 ) We conclude that Gaussian elimination is backward stable Ūδ L + max δ LδŪ 5 Bounding the error in the solution Let x = x + δx be the computed solution Then, from (A + δa) x = b we obtain δa x = b A x = r where r is called the residual vector From our previous analysis, Also, recall We know that A na, so r x δa 2n 2 (n + 1)Gau δx x δa A κ(a) 1 κ(a) δa A 2n(n + 1)Gu δa A Note that if κ(a) is large and G is large, our solution can be very inaccurate The important factors in the accuracy of the computed solution are: The growth factor G The condition number κ The accuracy u In particular, κ must be large with respect to the accuracy in order to be troublesome For example, consider the scenario where κ = 10 2 and u = 10 3, as opposed to the case where κ = 10 2 and u = 10 50 6 Iterative Refinement The process of iterative refinement proceeds as follows to find a solution to Ax = b: Numerically, this translates to x (0) = 0 r (i) = b Ax (i) Aδ (i) = r (i) x (i+1) = x (i) + δ (i) (A + δa (i) )δ (i) = (I + E (i) )r (i) x (i+1) = (I + F (i) )(x (i) + δ (i) ) 5
where the matrices E (i) and F (i) denote roundoff error Let z (i) = x x (i) Then which we rewrite as Taking norms yields x (i+1) x = (I + F (i) )(x (i) + δ (i) ) x Under the assumptions we obtain = (I + F (i) )(x (i) x) + F (i) x + (I + F (i) )δ (i) = (I + F (i) )[ z (i) + (I + A 1 δa (i) ) 1 z (i) + (I + A 1 δa (i) ) 1 (A 1 E (i) A)z (i) ] + F (i) x = (I + F (i) )(I + A 1 δa (i) ) 1 (A 1 δa (i) z (i) + A 1 E (i) Az (i) ) + F (i) x z (i+1) = K (i) z (i) + c (i) z (i+1) K (i) z (i) + c (i) K (i) τ, z (i+1) τ z (i) + σ x Assuming A 1 δa (i) α and E (i) ω, c (i) σ x τ i+1 z (0) + σ(1 + τ + + τ i ) x τ i+1 z (0) + σ 1 τ (i+1) x 1 τ τ = where F (i) ɛ For sufficiently large i, we have (1 + ɛ)(α + κ(a)ω) 1 α z (i) x ɛ 1 τ + O(ɛ2 ) From (1 α) (1 + ɛ)(α + κ(a)ω) 1 τ = 1 α we obtain 1 1 τ = 1 α (1 α) (1 + ɛ)(α + κ(a)ω) 1 α 1 2α κ(a)ω Therefore, 1/(1 τ) 2 whenever α 1 3 2 3 κ(a)ω, approximately It can be shown that if the vector r (k) is computed using double or extended precision that x (k) converges to a solution where almost all digits are correct when κ(a)u 1 Department of Computer Science, Gates Building 2B, Room 280, Stanford, CA 94305-9025 E-mail address: golub@stanfordedu 6