Chapter 2 Solving Systems of Equations A large number of real life applications which are resolved through mathematical modeling will end up taking the form of the following very simple looking matrix system Ax = b (2.) Here A represents a known m n matrix and b a known n vector. The vector x represents the n unknowns. Since a large variety of problems can be transformed into this general formulation a number of methods have been developed which can produce exact or approximate solutions for this problem. For systems where m = n the obvious solution, which is also the simplest, is to find the inverse of the matrix A in order to write the solution as x = A b. One important aspect when performing any numerical computations which we pay particular attention to will be that of the computation cost. Computation cost refers to the number of additions, subtractions, multiplications and divisions that must be performed in the computer in order for us to obtain the desired result. When the size of the matrices is sufficiently large the idea of simply finding the inverse of matrix A (if it exists) is not the most effective way to solve this problem. Computing the inverse has a very high computational cost! Alternatively, you might recall your early algebra classes where you encountered elimination and pivoting methods such as Gaussian elimination and backward substitution or otherwise also called Gauss-Jordan method. These methods require O(n 3 ) operations for matrices of size n n. Thus as the size of the matrix increases the cost in operation skyrockets. Instead, alternate, more effective techniques are used in practice. One common procedure is to produce a factorized version of A. That idea can reduce the operational cost of solving for x from n 3 to n 2. This translates to almost 99% reduction in calculations assuming that the matrices are larger than 00 00 (not unusual for applications nowadays). Unfortunately the operational cost of producing the factors of A in the first place is still in order of n 3 anyway. So overall we have not really gained much... Well that is not true. There is a benefit. For that however you should read further on regarding these methods below. 2. Gaussian elimination We begin by providing an outline for performing Gaussian elimination which you learned in your introductory linear algebra classes. We will subsequently improve this basic algorithm into the more 2
FMN050 Spring 205. Claus Führer and Alexandros Sopasakis page 22 efficient methods which were hinted to above and which we will explain in more detail in the sections below. One key aspect of the method which we need to emphasize is that of numerical stability. We could easily provide a method which performs naive Gaussian elimination and regrettably obtain completely wrong solutions! Numerical stability or lack of it depends on how you are going to perform the required operations in order to preserve as much numerical accuracy as possible. To avoid such numerical issues we must make sure that the largest numbers in a given row are used as denominators in the divisions which must be performed. To achieve this we perform row operations in order to place the largest such elements of each column in the proper position in the augmented matrix. Keep in mind two important points about Gaussian elimination: a) if the matrix A is singular it is not possible to perform the method and b) Gaussian elimination can be applied to any m mn matrix. As a result it is a general method and not limited to just square matrices. Note also that we should perform Gaussian elimination on the augmented matrix which consist of a new m n + matrix which contains all of matrix A with vector b attached at the end. Pseudo-code for Gaussian Elimination into row-echelon form. Main loop in k =, 2,..., m. 2. Find the largest element in absolute value in column k and call it max(k). 3. If max(k) = 0 then stop. The matrix is singular. 4. Swap rows in order to place the row with the largest element for column k in row k. This ensures numerical stability. 5. Do the following for all rows below the pivot. Loop in i = k +, k + 2,..., m. 6. Do the following for all elements in current row. For j = k, k +,..., n. 7. A(i, j) = A(i, j) A(k, j)(a(i, k)/a(k, k)) 8. Fill the remaining lower triangular part of the matrix with zeros A(i, k) = 0. As already discussed in the introduction we are particularly interested in methods which are efficient. In that respect the number of operations performed during the computation is of great interest. In that respect we must count the number of additions, subtractions, multiplications and divisions required in order to completely solve the problem above using Gaussian elimination. Note that the total number of divisions above is n(n )/2. The number of multiplications is n(n )(2n 2)/6. Finally the number of additions/subtractions are n(n )(2n )/6. The overall cost of Gaussian elimination therefore is O(2n 3 /3). The Big O notation is used to imply that the largest term in the total number of operations for this method is 2n 3 /3. Now we must also undertake the task of back-substitution in order to find the actual solution x for this system. This however is a relatively easy computational task. We provide this short pseudocode below as well. We assume for now that we have a system of the form Ux = b where the matrix U is an upper triangular matrix for order m n. Pseudo-code for back-substitution
FMN050 Spring 205. Claus Führer and Alexandros Sopasakis page 23. Main loop for i = m, m,...,. 2. If U(i, i) = 0 then stop. The matrix is singular. 3. Construct b(i) = b(i) n j=i+ U(i, j)x(j) 4. Solve x(i) = b(i)/u(i, i) The number of operations for back-substitution is as follows: n divisions, (n )n/2 multiplications, n(n )/2 additions and subtractions. So clearly the highest operational cost is of order O(n 2 ). Therefore the overall cost of solving the system Ax = b is still in the order of O(n 3 ). Again, we believe that we can improve slightly on the efficiency of our methodology by considering a factorization of the matrix A instead. We do this next. 2.2 LU factorization - Doolittle s version In the next method which we examine now we factor matrix A into two other matrices: a lower triangular matrix L and an upper triangular matrix U such that A = LU The overall idea for solving system Ax = b will be as follows: we start by replacing the matrix A with its factors LU. Thus we can write the system as LUx = b We now define the product Ux to a new variable y. Thus we have, LUx = b becomes Ly = b where y = Ux. Since L is a lower triangular matrix the system Ly = b is almost trivial to solve for the unknown y s. Once we find all the values for y then we can start solving the system Ux = y Note that this system is also very easy to solve since U is an upper triangular matrix. Thus finding x with this method is also very easy. The only thing left to do is to actually compute the lower triangular matrix L and the upper triangular matrix U for which A = LU. This is accomplished by the usual Gaussian elimination method which is applied only up to the point of obtaining an upper triangular matrix (without the back substitution). Let us look at a simple example: Example Solve the following matrix system using an LU factorization. 2 3 3 4 0 x x 2 x 3 = Solution The main part will be to produce the LU factorization. Once this is done then solving the system 0 6
FMN050 Spring 205. Claus Führer and Alexandros Sopasakis page 24 will be easy. To produce the factorization we start with the usual Gaussian elimination method. For ease in notation we denote by R each row of A. Then to create zeros below the element a(, ) we simply do the following 3R + R 2 R2, R + R 3 R 3, 2 3 0 2 8 0 2 2 Last we create zero below a(2, 2) via R 2 + R 3 R 3, 2 3 0 2 8 0 0 6 This procedure remarkably has already produced our required matrices L and U from A. In fact the matrices are, 0 0 2 3 A = LU = 3 0 0 2 8 0 0 6 Do the multiplication to check the result! How did we obtain the matrix L? Note that L is simply the matrix containing all the coefficients by which we multiplied in order to create L through the Gaussian elimination. The diagonal elements of L are always supposed to be, for the Doolittle method, so we do not need to compute that. Let us now revisit the original system Ax = b. Given L and U we can solve easily the original system as follows: first solve LY = B, 0 0 3 0 y y 2 y 3 = Top down you can almost read the solution as, y = 0, y 2 = 6 and y 3 = 5. Now you can solve the second part which is UX = Y for X, 2 3 x 0 UX = 0 2 8 x 2 = 6 0 0 6 x 3 5 This time the solution is read from the bottom up as x = /6, x 2 = /3 and x 3 = 5/6. The following pseudo-code outlines this procedure, Pseudo-code for LU. Input matrix A, and the diagonal elements of L (i.e. ones). 2. Let u(, ) = a(, )/l(, ). If l(, )u(, ) = 0 then LU factorization is not possible and STOP 3. For j = 2,..., n let u(, j) = a(, j)/l() and l(j, ) = a(j, )/u(, ). 4. For i = 2, 3,..., n do 0 6
FMN050 Spring 205. Claus Führer and Alexandros Sopasakis page 25 Let u(i, i) = a(i, i) i l(i, i) k= l(i, k)u(k, i) l(i, i) If l(i, i)u(i, i) = 0 then STOP. Print Factorization is not possible. For j = i +,..., n ( Let u(i, j) = a(i, j) ) i k= l(i, k)u(k, j) /l(i, i) ( Let l(j, i) = a(j, i) ) i k= l(j, k)u(k, i) /u(i, i) 5. Let u(n, n) = a(n, n) n k= l(n, k)u(k, n). If l(n, n)u(n, n) = 0 then The factorization exist A = LU but A is a singular matrix!. 6. Print out all L and U elements. Once you have the factorization then you can solve the matrix system with the following very simple substitution scheme, Pseudo-code for solution of AX = B. First solve LY = B. 2. For i =, 2,..., n do ( y(i) = b(i) ) i j= l(i, j)y(j) /l(i, i) 3. Now solve UX = Y by back substitution in exactly the same way, 4. For i = n, n,..., do ( x(i) = y(i) ) j=n u(i, j)x(j) /u(i, i) There are a couple of results which are interesting since they give us general criteria under which these methods are applicable. The following definition is necessary first, Definition 2.2.. The n n matrix A is said to be strictly diagonally dominant if a(i, i) > n a(i, j) for all i =, 2,..., n i.e row sum. j i The results comes indirectly through Gaussian elimination: Theorem 2.2.2. A strictly diagonally dominant matrix A is non-singular. Furthermore Gaussian elimination can be performed on any linear system of the form Ax = B to obtain its unique solution without row or column interchanges, and the computations are stable with respect to the growth of round-off errors. When can we perform LU decomposition? The following theorem gives the answer,
FMN050 Spring 205. Claus Führer and Alexandros Sopasakis page 26 Theorem 2.2.3. If Gaussian elimination can be performed on the linear system AX = B without row interchanges then the matrix A can be factored into the product of a lower-triangular matrix L and an upper-triangular matrix U, where A = LU. There is another type of factorization which is in fact very similar to this LU or Doolittle s decomposition. The alternate factorization method also produces an LU decomposition where U being a unit upper triangular matrix instead of L. This is called Crout s factorization. Naturally either factorization will do the job and producing one or the other is a matter of taste than anything else. You can change the provided pseudo-code very easily in order to produce such a factorization. LDL T and LL T or Choleski s factorization We continue here by presenting more methods for factoring A. All the techniques presented, similarly to the LU decomposition of A are of the same overall operational cost of O(n 3 ). As the name denotes an LDL T type factorization takes the following form, A = LDL T where L as usual is lower triangular and D is a diagonal matrix with positive entries in the diagonal. Similarly the Choleski factorization A = LL T consists of a lower and upper triangular matrix where neither of which have s in the diagonal (in contrast to either Doolittle s or Crout s factorizations). It is very easy to construct any of the above factorizations once you have an LU decomposition of A. Let us look at the equivalent factorizations for the following matrix 60 30 20 A = 30 20 5 20 5 2 Using our pseudo-code we obtain the following LU decomposition of A, 0 0 60 30 20 A = LU = /2 0 0 5 5 /3 0 0 /3 Now the equivalent LDL T decomposition consist of the following three matrices, 0 0 60 0 0 /2 /3 A = LDL T = /2 0 0 5 0 0 /3 0 0 /3 0 0 Note how the new upper triangular matrix has been obtained by simply dividing each row of the old upper triangular matrix with the respective diagonal element. Now that we have the LDL T factorization we can also easily obtain the equivalent Crout s factorization, A = 60 0 0 30 5 0 20 5 /3 /2 /3 0 0 0
FMN050 Spring 205. Claus Führer and Alexandros Sopasakis page 27 Note here that the new lower triangular matrix is constructed by simply multiplying out matrices L and D. Last the Choleski decomposition is also easily constructed from the LDL T form above by simply dividing the diagonal matrix D into to matrices D = D D and multiplying out L D to produce a lower triangular and DL T to produce an upper triangular matrix A = L D DL T 0 0 60 2 3 60 0 0 = /2 0 0 5 4 0 5 0 /3 0 0 3 3 0 0 3 3 60 0 0 60 2 60 3 60 = 2 60 5 0 0 5 5 3 60 5 3 3 0 0 3 3 /2 /3 0 0 0 This is the LL T form of the matrix A. Let us now look at results regarding when we can perform most of these factorizations. We will first need the following definition, Definition 2.2.4. A matrix A is positive definite if it is symmetric and if x T Ax > 0 for every x 0. Thus based on this definition the following theorem holds, Theorem 2.2.5. If A is an n n positive definite matrix then the following are equivalent, A max a(k, j) k,j n a(i, i) > 0 a 2 (i, j) < a(i, i)a(j, j) is nonsingular max n a(i, i) for all i =, 2,..., n for each i j Recall that one of conditions for a matrix to be nonsingular is that det A 0. Further, Theorem 2.2.6. A matrix A is positive definite if and only if any of the following hold, A = LDL T A = LL T
FMN050 Spring 205. Claus Führer and Alexandros Sopasakis page 28 2.3 Iterative methods for AX = B As the name denotes we will now attempt to solve the system AX = B with an iterative scheme instead of a direct method. The difference is that the solution produced by any of the direct methods presented in the previous section is exact and is determined immediately. In contrast, as is often the case with any iterative scheme, their solutions are obtained after a number of iterations and are not exact but only approximations up to a given tolerance near the true solution. As we will see in the following section iterative techniques are quite useful when the number of equations to be solved is large (i.e. the size of the matrix is large). Furthermore such methods tend to be stable with regards to matrices A with large condition number. As a result small initial errors do not pile up during the iterative process thus blowing up in the end. 2.4 Jacobi, Richardson and Gauss-Seidel methods We start by discovering the Jacobi and Gauss-Seidel iterative methods with a simple example in two dimensions. The general treatment for either method will be presented after the example. The most basic iterative scheme is considered to be the Jacobi iteration. It is based on a very simple idea: solve each row of your system for the diagonal entry. Thus if for instance we wish to solve the following system, [ 4 3 2 5 [ x x 2 = we first solve each row for the diagonal element and obtain [ 5 6 x = 3 4 x 2 + 5 4 x 2 = 2 5 x + 6 5 (2.2) Thus the Jacobi iterative scheme starts with some guess for x and x 2 on the right hand side of this equation and hopefully produces after several iterations improved estimates which approach the true solution x. In matrix form the system above can be written as, [ x m x m 2 = [ 0 3/4 2/5 0 [ x m x m 2 + [ 5/4 6/5 where you can clearly see how the iteration is progressing. Your previous estimate for the solution x m goes in the right hand side and you obtain a new estimate (which is supposed to be better) in the left hand side. Let us examine the output of the Jacobi scheme for a few iterations based on this numerical example. We will assume that the initial guess is taken to be, without loss of generality, x 0 = [0, 0 T n 0 5 0 5 20 Exact Sol. x n 0 2.907500 3.063965 3.07030 3.07404 3.07428574 x n 2 0 2.38000 2.422670 2.428302 2.428557 2.428574285 A very simple but effective improvement has been suggested to the Jacobi scheme. It is the Gauss- Seidel method which simply uses the new value of x in the second row of (2.2). Thus the Gauss-Seidel (2.3)
FMN050 Spring 205. Claus Führer and Alexandros Sopasakis page 29 method for the example above is, [ x m x m 2 = [ 0 3/4 2/5 0 [ x m x m 2 + [ 5/4 6/5 Let us similarly compare a small number of iterates using the Gauss-Seidel method in the following table. n 0 5 0 5 20 Exact Sol. x n 0 3.056675 3.07392 3.07428 3.07428 3.07428574 x n 2 0 2.422670 2.428557 2.42857 2.42857 2.428574285 You may be wondering, rightfully so, whether it really is that simple... In other words whether the method, as outlined above, works all the time. The answer is NO! The reason that things worked out so nicely in the example presented above is that matrix A is diagonally dominant. In fact we provide the relavant theorems in terms of when things are expected to work out for either the Jacobi or the Gauss-Seidel method below (see Theorems 2.4.3 & 2.4.4). Generalization of iterative methods We will now generalize our findings and produce a general theory under which to study iterative schemes. In order to do this we make use of an auxiliary matrix Q to be specified later. The idea relies on what we learned earlier about fixed point problems in one dimension. Let us start by outlining our general set-up. We start as usual from the main system of equations in matrix form (2.4) Ax = B (2.5) where as we have seen before A and B are known while x denotes the vector of the unknowns. First we bring the Ax term to the left hand side 0 = AX + B and then we add an auxiliary vector Qx on both sides of (2.5) and produce the following system Qx = (Q A)x + B (2.6) This new system will be used in order to define our iteration as follows Qx m = (Q A)x m + B First we observe that in fact the solution of (2.6) is simply found from x = Q (Q A)x + Q B = (I Q A)x + Q B. The iterative scheme corresponding to this set-up is clearly recognizable as a Fixed Point Problem in n-dimensions: x m = Gx m + C (2.7) where G = I Q A and C = Q B. The iterative process can now be initiated with a given initial vector x 0 for the solution. This is usually only a guess but if any information is known about the solution should be used in obtaining a better initial such guess for x 0. Given this set-up the only (and most important) thing left to do is choose a matrix Q so that the iterative process outlined in (2.7) will
FMN050 Spring 205. Claus Führer and Alexandros Sopasakis page 30 converge to the true solution x. produce the solution in a small number of iterations We have already seen a couple of iterative methods which did fulfill these tasks in one way or another. Let us look at a more general approach in constructing such iteratives schemes. Suppose that we can write A as, A = D L U where as usual D is diagonal matrix and L, U are lower and upper triangular matrices respectively (with zeros in their diagonal). Then the matrices for each iterative method are given below. Jacobi: if Q = D in (2.7) then G = D (L + U) and C = D B Richardson: if Q = I in (2.7) then G = I A and C = B Gauss-Seidel: if Q = D + L in (2.7) then G = (D + L) U and C = (D + L) B It is important to know when we can expect to have a solution of (2.7). Is it possible to always have a solution of this iterative scheme? The answer is naturally no! We develop below a result which indicates whether we should expect our iteration to be successful or not. We first define what we mean by convergence of an iterative method. Definition 2.4.. An n n matrix A is said to be convergent if lim m Am (i, j) = 0 for i, j =, 2,..., n Further the following holds: Theorem 2.4.2. The following are equivalent: A is a convergent matrix lim m A m = 0 lim m A m x = 0 for every x. ρ(a) < where ρ(a) denotes the spectral radius of the matrix A which is essentially the largest, in absolute value, eigenvalue of A. Then the following theorem gives a very useful result, Theorem 2.4.3. The iterative scheme x m = Gx m + C converges to the unique solution of x = Gx + C for any initial guess x 0 if and only if ρ(g) <. Proof: Subtracting x = Gx + C from x m = Gx m + C we obtain, x m x = G(x m x) Simply taking norms on both sides of the above we have, x m x = G(x m x) G (x m x)
FMN050 Spring 205. Claus Führer and Alexandros Sopasakis page 3 Applying this inequality repeatedly for m, m 2,... we obtain x m x G (x m x) G 2 (x m 2 x). G m (x 0 x) Thus clearly from the above if we assume that ρ(g) based on Therem 2.4.2 we have, and lim m G m = 0 x m x = 0 Thus convergence. We leave the opposite direction of this proof to the reader since it follows from this outline. However this proof is in fact instructive in terms of answering other interesting questions such as how many iterations of the Jacobi iteration are necessary in order for the solution to be found within a given tolerance? Let us look at such an example. Example: Find the number of iterations so that the Jacobi method starting from the vector x 0 = [0, 0 T will reach the solution with a relative error tolerance of 0 4 for the following matrix A, [ 4 3 A = 2 5 Solution: We need to approach the solution x using the Jacobi iteration, x m = Gx m + C (2.8) Suppose then that the solution is x. Then iteration (2.8) has to be satisfied for the solution x as follows, x = Gx + C Subtracting these two equations and taking norms we obtain, Repeating this m more times we obtain, x m x G x m x x m x G m x 0 x Note however that for this problem we have chosen x 0 = [0, 0 T. Thus the above becomes, x m x G m x
FMN050 Spring 205. Claus Führer and Alexandros Sopasakis page 32 or x m x x G m We know from our matrix algebra review in the Appendix that we may employ the spectral radius in order to calculate the Euclidean norm (you may try other norms if you like instead) of G as follows: G 2 = ρ(g T G). Thus using the Euclidean norm everywhere we obtain that the relative error should be x m x 2 x 2 ρ(g T G) m/2 Note that in fact the left hand side is nothing more than the relative error. Therefore in order to find out how many iterations are necessary in order to approach the solution within a relative error tolerance of 0 4 we must solve for m the following equation (ρ(g T G)) m/2 = 0 4 Since ρ(g T G).5625 then G 2 =.75 and most importantly that m = ln 0 4 ln G 2 = 4 ln 0 ln 3/4 = 32.057 Thus if we choose m = 33 we should be within 0 4 of the true solution for this system. Let us now look at some theoretical results for each of the methods presented so far. Theorem 2.4.4. If A is diagonally dominant then the sequence produced by either the Jacobi or the Gauss-Seidel iterations converges to the solution of Ax = B for any starting guess x 0. We outline the proof here only for the Jacobi iteration since the Gauss-Seidel is similar. Proof: Note that the Jacobi iteration matrix G can be written as, G = D (L + U) In that case taking the matrix norm of the above and rearranging we get, G = L + U = max i n j i A(i, j) D max i n A(i, i) where the last inequality holds simply by the definition of A being diagonally dominant. 2.5 Comparisons In terms of speed you should always keep in mind that iterative, direct or other methods always depend on the problem at hand. For instance each iteration using either the Gauss-Seidel or Jacobi method requires about n 2 operations. However if you are solving a small size system of equations then Gaussian elimination is much faster. Take for example a small 3 3 system. If you perform a Jacobi iteration on it you will require about 9 operations per iteration and you may need to perform more than 00 iterations to obtain a very good estimate. Thus a total of at least 900 operations. On the other hand Gaussian elimination only needs to perform 3 3 = 27 operations to solve the
FMN050 Spring 205. Claus Führer and Alexandros Sopasakis page 33 whole system and produce the exact solution! In fact it can be shown that iteration is preferable to Gaussian elimination if ln ɛ ln ρ < n (2.9) 3 Here n corresponds to the size of the matrix A, ρ refers to the spectral radius of the iterative scheme ρ ρ(g) and ɛ is the given relative error tolerance we wish to obtain from the iteration. Let us look at a simple example of this result: Example: As usual we wish to solve the matrix system Ax = B. Suppose that A is a 30 30 matrix and that the spectral radius for the Gauss-Seidel iterative scheme is found to be ρ(g) =.4. Suppose also that we wish to find the solution accurate to within ɛ = 0 5. Is it best to perform the Gauss-Seidel iteration or just simple Gaussian elimination? Solution: Note that for this example ln ɛ ln ρ(g) =.5.9 = 2.56 Therefore inequality (2.9) gives 2.56 < 30 3 = 0 Not true! Thus in this case Gaussian elimination is actually going to be faster! One thing to keep in mind is that in fact there are matrix systems for which one method might converge while the other might not (the reason being that the spectral radius of the iteration is not less than ). Let us outline some important points about these methods and compare them with other techniques: Gauss-Seidel is faster than Jacobi Gauss-Seidel and Jacobi methods have a cost which is about n 2 operations. One iterative scheme may converge to the solution while another may not. This may depend on the choice of initial guess x 0 but more importantly on the spectral radius of the iterative scheme (ρ(g) < ). Gaussian elimination although it costs about n 3 operations may be faster when it comes to moderate size systems. Let us see the pseudo-code for some methods: Jacobi: Suppose that we are provided with a matrix A a vector B and a starting guess vector x 0.. For i = to n do Y (i) = x 0 (i) 2. While a given tolerance ɛ is satisfied do the following: For i = to n do Z(i) = B(i) i j= A(i, j)y (j) n j=i+ A(i, j)y (j) A(i, i)
FMN050 Spring 205. Claus Führer and Alexandros Sopasakis page 34 For i = to n do Y (i) = Z(i). 3. Print out the vector Z. Gauss-Seidel: Suppose that we are provided with a matrix A a vector B and a starting guess vector x 0.. For i = to n do Y (i) = x 0 (i) 2. While a given tolerance ɛ is satisfied do the following: For i = to n do the following two steps: 3. Print out the vector Z. Z(i) = B(i) i j= A(i, j)y (j) n j=i+ A(i, j)y (j) A(i, i) Y (i) = Z(i) However we can in fact come up with methods which can converge to the solution under appropriate conditions even faster (see SOR and SSOR methods for instance).