Solving Linear Systems of Equations

November 6, 2013

Introduction The type of problems that we have to solve are: Solve the system: A x = B, where a 11 a 1N a 12 a 2N A =.. a 1N a NN x = x 1 x 2. x N B = b 1 b 2. b N To find A 1 (inverse of a matrix) The determinant of a matrix A The eigenvalues and eigenvectors of a matrix A

Gauss Method Let assume that we have to solve the linear system A x = B with det (A) 0. a 11 x 1 + a 12 x 2 + a 13 x 3 +... + a 1N x N = b 1 a 21 x 1 + a 22 x 2 + a 23 x 3 +... + a 2N x N = b 2 a N1 x 1 + a N2 x 2 + a N3 x 3 +... + a NN x N = b N. (1) We will try to transform it into an upper-triangular linear system. The steps are the following:

Gauss Method STEP 1 Multiply the 1st equation with a 21 /a 11 and substract it from the 2nd. Similarly, multiply the 1st with a 31 /a 11 and substract it from the 3rd etc a 11 x 1 + a 12 x 2 + a 13 x 3 + + a 1N x N = b 1 0 + a (1) 22 x 2 + a (1) 23 x 3 + + a (1) 2N x N = b (1) 2 where, for example, we set: 0 + a (1) N2 x 2 + a (1) N3 x 3 + + a (1) NN x N = b (1) N.. (2). a (1) 22 = a 12a 21 a 11 a 22

Gauss Method STEP 2 The 1st row and the 1st column remain unchanged and we multiply the 2nd equation with a (1) 32 /a(1) 22 and substract it from the 3rd and so on. Thus after n 1 operations we get a 11 x 1 + a 12 x 2 + a 13 x 3 + + a 1N x N = b 1 0 + a (1) 22 x 2 + a (1) 23 x 3 + + a (1) 2N x N = b (1) 2 0 + 0 + a (2) 33 x 3 + + a (2) 3N x N = b (2) 3 Where for example we have set: 0 + 0 + a (2) N3 x 3 + + a (2) NN x N = b (2) N.. (3). a (2) 33 = a(1) 32 a(1) 33 a (1) 22 a (1) 33

Gauss Method After N 1 steps we get the following upper-triangular system: a 11 x 1 + a 12 x 2 + a 13 x 3 + + a 1N x N = b 1 a (1) 22 x 2 + a (1) 23 x 3 + + a (1) 2N x N = b (1) 2 a (2) 33 x 3 + + a (2) 3N x N = b (2) 3. (4) 0 + 0 + + a (N 1) NN x N = b (N 1) N For the Nth equation of the above system (5)we get : x N = b(n 1) N a (N 1) NN for a (N 1) NN 0 (5) while the rest of the values can be calculated via the relation: x i = b (i 1) i N k=i+1 a (i 1) ii a (i 1) ik x k for a (i 1) ii 0 (6) The no of arithmetic operations needed is: (4N 3 + 9N 2 7N)/6.

Determinant If a matrix is transformed into an upper-triangular or lower-triangular or diagonal form then det A = a 11 a 22 a 33 a NN = N a ii (7) i=1

Pivoting DEFINITION: The number a ii in the position (i, i) that is used to eliminate x i in rows i + 1, i + 2,...,N is called the ith pivotal element and the ith row is called the pivotal row. Pivoting to avoid a (i) ii = 0 : If a (i) ii = 0, row i cannot be used to eliminate, the elements in column i below the diagonal. It is neccesary to find a row j, where a (i) ji 0 and j > i and then interchange row i and j so that a nonzero pivot element is obtained. Pivoting to reduce error : If there is more than one nonzero element in column i that lies on or below the diagonal, there is a choice to determine which rows to interchange. Because, the computer uses fixed-precision arithmetic, it is possible that a small error is introduced each time an arithmetic operation is performed. One should check the magnitude of all the elements in column i that lie on below the diagonal, and locate a row j in which the element has the largest absolute value, that is a ji = max{ a ii, a i+1 i,..., a N 1 i, a N i }, and then switch row i with row j if j > i. Usually, the larger pivot element will result in a smaller error being propagated.

Pivoting : Example Let s assume that after a number of operations the last two equations of an N N system are 0x N 1 + x N = 1 and 2x N 1 + x N = 3 The obvious solution is x N 1 = x N = 1. But due to accumulation of numerical errors the system will be: ɛx N 1 + x N = 1 and 2x N 1 + x N = 3 ɛx N 1 + x N = 1 and (1 2/ɛ) x N = 3 2/ɛ x N = 3 2/ɛ 1 2/ɛ 1 correct, but x N 1 = 1 x N! ɛ I.e. x N 1 is the ratio of two small numbers with questionable accuracy. Then this erroneous term will be used for the estimation of x N 2,..., x 1 with disastrous results. With pivoting, the system will be written as: 2x N 1 + x N = 3 and ɛx N 1 + x N = 1 2x N 1 + x N = 3 and (1 ɛ/2) x N = 1 3ɛ/2 then x N 1.0 and x N 1 = 3 x N 2 1.0.

Gauss - Jordan Method It is a variation of Gauss s method. Here instead of transforming the original system into an upper-triangular one we transform it into a diagonal one and then the solution is obvious. The method follows the steps of Gauss s method with parallel elimination of the elements over the diagonal. Thus after the 1st step of Gaussian elimination we multiply the 2nd equation with a 12 /a (1) 22 and substract it from the 1st. Thus the system becomes: a 11 x 1 + 0 + a (2) 13 x 3 +... + a (2) 1N x N = b (2) 1 0 + a (1) 22 x 2 + a (1) 23 x 3 +... + a (1) 2N x N = b (1) 2 0 + 0 + a (2) 33 x 3 +... + a (2) 3N x N = b (2) 3 0 + 0 + a (2) N3 x 3 +... + a (2) NN x N = b (2) N. (8) and in the next step we eliminate the terms with coefficients a (2) 13 anda(1) 23.

Gauss - Jordan Method Thus after N steps we get the system a 11 x 1 = b (N 1) 1 a (1) 22 x 2 = b (N 1) 2. (9) a (N 1) NN x N = b (N 1) N with obvious solution x i = b(n 1) i a (i 1) ii. (10)

The Jacobi Method (iterative) It is a generalization of the method x = g (x) presented in the previous section. Let s assume the following system of N linear equations with N unknowns: f 1 (x 1, x 2,..., x N ) = 0 f 2 (x 1, x 2,..., x N ) = 0 f n (x 1, x 2,..., x N ) = 0 this can be easily written in the form: or in a close form: x 1 = g 1 (x 2, x 3,..., x N ) x 2 = g 2 (x 1, x 3,..., x N )... (11)... (12) x N = g N (x 1, x 2,..., x N 1 ) x i = b i a ii 1 a ii N j=1, j i a ij x j (13)

The Jacobi Method (iterative) Then by giving N initial values x (0) 1, x (0) 2,..., x (0) N recurrence relation x (k+1) i which will converge to the solution of the system if:, we create the = g i (x (k) 1,..., x (k) N ). (14) a ii > N j=1, j i a ij (15) independent on the choice of the initial values x (0) 1, x (0) 2,..., x (0) N. The recurrence relation can be written in a matrix form as: x (k+1) = D 1 B D 1 Cx (k) where A = D + C i.e. the matrix D includes only the diagonal elements of A and the matrix C all the rest.

Jacobi Method : Example 4x y + z = 7 4x 8y + z = 21 2x + y + 5z = 15 we can create the recurrence relations: wih solutions x = 2, y = 4, z = 3 x (k+1) = 7 + y (k) z (k), y (k+1) = 21 + 4x (k) + z (k) 4 8 z (k+1) = 15 + 2x (k) y (k) 5 then assuming (1, 2, 2) we get the following sequence of solutions (1, 2, 2) (1.75, 3.375, 3) (1.844, 3.875, 3.025) (1.963, 3.925, 2.963) (1.991, 3.977, 3.0) (1.994, 3.995, 3.001) I.e. with 5 iterations we reached the solution with 3 digits accuracy.

Gauss - Seidel Method Variation of Jacobi s method. Let s assume an trial solution x (0) i then from the 1st equation we estimate x (1) 1 and in the 2nd equation we use the following set of trial values (x (1) 1, x (0) 2, x (0) 3 Then we use the trial values (x (1) 1, x (1) 2, x (0) 3 so on. The recurrence relations will be: x (k+1) 1 = 1 N b 1 a 1j x (k) j a 11 x (k+1) 2 = 1 a 22 x (k+1) i = 1 a ii j=2 b 2 a 21 x (k+1) 1 i 1 b i j=1 a ij x (k+1) j,..., x (0) N,..., x (0) N (1) ) to estimate x 2. (1) ) to estimate x 3 and N a 2j x (k) j=3 N j=i+1 j a ij x (k) (16) j

Gauss - Seidel Method The method will converge if: a ii > N j=1, j i a ij i = 1, 2,..., N (17) This procedure in a matrix form will be written as: where x (k+1) = D 1 ( B Lx (k+1) Ux (k)) (18) A = L + lower D diagonal + U upper. The matrix L has the elements of below the diagonal A, the matrix D only the diagonal elements of A and finally the matrix U the elements of matrix A over the diagonal.

Gauss - Seidel : Example The recurrence relations for the previous example are the same i.e. x (k+1) = 7 + y (k) z (k) 4 y (k+1) = 21 + 4x (k) + z (k) 8 z (k+1) = 15 + 2x (k) y (k) 5 But with Gauss - Seidel method we get the following sequence of approximate solutions: (1, 2, 2) (1.75, 3.75, 2, 95) (1.95, 3.97, 2.99) (1.996, 3.996, 2.999) i.e. here we need only 3 iterations to find the solution while with Jacobi s method we need 5 iterations.

Inverse of a matrix ( ) a b If we are seeking the inverse of the matrix A = this will be ( ) c d x y another matrix B = with the following property: z w ( ) ( ) ( ) a b x y 1 0 A B = = = I c d z w 0 1 In order to find the elements (x, y, z, w) of the inverse matrix we must solve two linear systems: ( ) ( ) ( ) ( ) ( ) ( ) a b x 1 a b y 0 = and = c d z 0 c d w 1 This means that the inverse of a matrix cannot help in the solution of a linear system since one needs to solve the system to find it. Notice that in order to solve both of the above systems we need to diagonalize (with the Gauss - Jordan procedure) the same matrix A. The only difference between the two systems is the vector on the right hand side which corresponds to different column of the unit matrix.

Inverse of a matrix We then construct the following scheme: a 11 a 12... a 1N a 21 a 22... a 2N a N1 a N2... a NN A 1 0... 0 0 1... 0 0 0... 1 I (19) and whatever transformation we do in the left side (matrix A) exactly the same we do on the right hand side (matrix I). 1 0... 0 ã 11 ã 12... ã 1N 0 1... 0 ã 21 ã 22... ã 2N (20) 0 0... 1 ã N1 ã N2... ã NN Thanks to the Gauss-Jordan method on the left hand side we reduce to the unit matrix matrix I and on the right hand side on the inverse ã ij = A 1.

Eigenvalues and Eigenvectors If A is a N N matrix then the scalars, λ for which there exists a non-zero vector x such that A x = λ x (21) will be called eigenvalues of the matrix A and the vectors x eigenvectors. In the following example: 1 2 3 1 3 1 2 1 = 1 2 1 2 0 1 2 2 A u 1 λ 1 u 1 the vector u 1 = (2, 1, 2) is the e-vector and λ 1 = 1 the corresponding e-value of A. Equation (21) is equivalent to: det (A λi) = 0 that is det 1 λ 2 3 1 3 λ 1 2 0 1 λ = λ3 5λ 2 + 3λ + 9 = 0 (22) and the e-values will be the roots of the characteristic polynomial (here λ i = 1, 3 and 3).

Eigenvalues and Eigenvectors : The power method For a matrix A there will always be a dominant e-value λ 1 (this means that λ 1 is absolutely larger than any other e-value) λ 1 > λ 2 λ 3... λ N (23) and also any vector x can be written as a linear combination of the N e-vectors { u (1), u (2),..., u (N)} i.e. then A u (i) = λ i u (i) (1 i N) (24) x = a 1 u (1) + a 2 u (2) + + a N u (N) (25) If we multiply both sides of (25) A, we get: x (1) A x = a 1 λ 1 u (1) + a 2 λ 2 u (2) + + a N λ N u (N) (26) If we multiply k times eqn (26) with the matrix A we get: x (k) A k x = a 1 λ k 1u (1) + a 2 λ k 2u (2) + + a N λ k Nu (N) ( ( ) k ( ) ) k = λ k 1 a 1 u (1) λ2 + a 2 u (2) λn + + a N u (N) (27) λ 1 λ 1

Since λ 1 is the absolute larger e-value (eqn 23), (λ j /λ 1 ) k 0 as k. Thus x (k) = A k x λ k 1 a 1 u (1) (28) then the ratio r (k) x (k+1) x (k) = Ak+1 x A k x λk+1 1 a 1 u (1) λ k 1 a λ 1 u (1) 1 (29) leads to the e-value λ 1 and the vector defined in (28) is the corresponding e-vector.

Eigenvalues and Eigenvectors : Example The matrix A = 1 0 1 1 2 2 1 0 3 has e-values: λ 1 = 3.41421356, λ 2 = 2 & λ 3 = 0.585786 and corresponding e-vectors : u (1) = (0.3694, 1, 0.8918), u (2) = (0, 1, 0) & u (3) = (0.7735, 1, 0.3204). Then if we assume a trial vector x = (1, 2, 1), and multiply it with the 5th and 6th power of A A 5 = 68 0 164 136 32 428 164 0 396 A 6 = 232 0 560 532 64 1484 560 0 1352 we get : x (5) = A 5 x = (232, 628, 560) & x (6) = A 6 x = (792, 2144, 1912). This leads to λ 1 x (6) 792 = x (5) 232 2144 628 1912 560 3.414286 and the e-vector u (1) x (6) (0.3694, 1, 0.8918).

Eigenvalues and Eigenvectors : Aitken Acceleration The elements of the vector r (k) in eqn (29) are approximations to the exact e-value λ 1 and the error will be ɛ k = r k λ 1 where r is any element of r (k). Obviously for r (k+1) we get ɛ k+1 = r k+1 λ 1. Since the convergence of the method is linear:, i.e. ɛ k+1 = A ɛ k (30) we can use Aitken s method to accelerate the convergence. That is, we will use the formula derived in the previous section for 3 approximate values: r k, r k+1 and r k+2 to find a better approximation to the e-value i.e. λ 1 r k r k+2 r 2 k+1 r k+2 2 r k+1 + r k. (31) Application: From the example of the previous slide r 5 = 3.414, r 4 = 3.413 and r 3 = 3.407. Then Aitken s method gives λ 1 3.4142.

Eigenvalues and Eigenvectors : Inverse Power Method Theorem: If λ is an e-value of a matrix A, then λ 1 is an e-value of the A 1. Thus if λ 1 > λ 2 λ N 1 > λ N > 0, are the e-values of A then λ 1 > λ 1 λ 1 > 0 (32) N N 1 will be the e-values of A 1. Thus λ 1 N is the largest e-value of A 1. To find the e-value instead of calculating the inverse of the matrix A to get the expression ( A 1) (k+1) x we can solve by using Gaussian elimination the system A x (k+1) = x (k), where x (k) is defined as x (k) = A k x. That is for a initial x (0), we get x (1) = A 1 x (0) i.e. Ax (1) = x (0). Thus the solution of the linear system defines the value of x (1). With this procedure we get the vector x (k+1) via x (k+1) = A 1 x (k) A x (k+1) = x (k) (33) 1

Eigenvalues and Eigenvectors : Shifting Power Method Theorem: If the N values λ i with (i = 1,..., N) are the e-values of a N N matrix A, then for any complex number µ the matrix A µ I (where I is the unit matrix) will have as e-values the complex numbers (λ i µ) for (i = 1,... N). Thus if in the interval (λ N, λ 1 ) we consider a point µ such that for an e-value λ k to hold 0 < λ k µ < ɛ while for all the rest we get λ i µ > ɛ; then λ k µ will be the minimum e-value of A µ I. This means that we can estimate it by using the inverse power method Remember that you need to calculate x (k+1) by solving the system: ( A µ I ) x (k+1) = x (k) Thus if we calculate the r k, the actual e-value λ i will be: λ i = 1 r k + µ. (34)