(Lecture 5: Linear system of equations / Matrix Splitting) Bojana Rosić, Thilo Moshagen Institute of Scientific Computing
Motivation Let us resolve the problem scheme by using Kirchhoff s laws: the algebraic sum of all the currents flowing toward a node is equal to zero. I = I + I2 I3 = I R + I3 R3 = V In any closed circuit, the algebraic sum of all the voltages around the loop is equal to zero V = IR I2 R2 I3 R3 = V2 2 / 5
Motivation Hence, we end up with the system of equations I + I2 I3 = I R + I3 R3 = V I2 R2 I3 R3 = V2 which is easy to solve algebraically by elimination I = I2 + I3 and substitution (I2 + I3 )R + I3 r3 = V I2 = (V + I3 (R3 R ))R and so on... 3 / 5
What to do in this case? Algebraically is difficult to solve. 4 / 5
But, engineering goal is to solve large-scale (realistic) systems This usually matches with solving x = F (x) in which x is the system state. 5 / 5
Special case Special case are systems in a linear form Ax = b in which matrix A can be of dimensions larger than. Think about A R A R Examples are: solving linear partial differential equations (elasticity, heat equation etc.) In these cases one cannot compute x = A b as one would have problem with memory or computation time. 6 / 5
Linear systems There are many methods one can use to solve the system Ax = b such as: Stationary iterative methods (approximate operator) Gauss-Seidel (GS) Jacobi (JM) Successive over-relaxation (SOR) Krylov subspace methods Conjugate Gradient (CG) Biconjugate gradient method (BiCG) generalized minimal residual method (GMRES) combination 7 / 5
Solving system Remembering fixed point iteration, we may try something similar. Let us guess the solution by putting x = x (k) Since we do not know if our guess is correct, we may check its accuracy by evaluating the residual d (k) = b Ax(k) Now we may have educated guess by drawing conclusions from the error value. 8 / 5
What would be our next guess? Thus, our next guess x(k+) will be smaller or larger than x(k) depending on the residual d(k). Hence, x(k+) = x(k) + v(k) in which v(k) denotes correction. Our goal is now to find the best correction! Naturally, the best correction v(k) would be the correction which satisfies the equation x(k) + v(k) = x. so the error. The only problem is that we do not know x!! 9 / 5
But, we do know that the best correction v(k) has to satisfy Ax = A(x(k) + v(k) ) = b This in turn gives us equation for the correction Av(k) = b Ax(k) = d(k) which further yields v(k) = A (b Ax(k) ) = A d(k) Thsi is explicit solving again. / 5
Solution: pose the problem differently Do not invert matrix A, but find some simpler matrix C A which is easy to invert! Then the problem v(k) = A d(k) reduces to v(k) = C d(k) i.e. C v(k) = d(k) which takes only small computational time. / 5
Linear iteration scheme Then we can formulate the general linear iteration scheme x(k+) = x(k) + C b Ax(k). {z } v(k) in which C can be chosen in different ways. With respect to the type of approximation one may distinguish: Jacobi method Gauss-Seidel method The successive over relaxation method, etc. These methods choose C from LDU decomposition of a matrix A. 2 / 5
LDU decomposition A = L + D + U, where L is a lower triangular matrix with main diagonal equals, D is a diagonal matrix und U is an upper triangular matrix with main diagonal equals zero. Decomposition of A: 2 2 2 3 3 3 4 4 4 5 5 5 6 6 6 7 7 7 8 8 8 9 9 9 2 3 4 5 6 nz = 495 7 8 9 2 Lij =, j i 3 4 5 6 nz = diag(aii ) 7 8 9 2 3 4 5 6 nz = 495 7 8 9 Uij =, j i 3 / 5
Jacobi method Let us take that C is approximated only by diagonal C := D = diag(aii ). Then it follows x(k+) = x(k) + D b Ax(k). {z } 2 3 4 v(k) 5 With D = diag aii and 6 7 b Ax(k) i = bi n j= (k) aij xj 8 9 2 3 4 5 6 nz = 7 8 9 4 / 5
Jacobi method The iterative procedure in element-wise form reads: n (k+) (k) xi = aij xj, bi aii j= i =,..., n. i6=j Hence, the values aii must be different than zero. 5 / 5
Exercise Let us use Jacobi method to solve 2x + y = 5x + 7y = 3 by starting at x() = (; )T. Hence, 2 A=, b= 5 7 3 6 / 5
Exercise The first iteration reads x() = x() + D b Ax() where D is the diagonal part of the matrix A: 2 D= 7 with the inverse D = /2. /7 7 / 5
Exercise Hence, x () /2 = + ( /7 3 2 ) 5 7 5. x() =.429 After 22 iterations one obtains: 7. (22) x = 3.2222 8 / 5
Exercise Let us check solution 2 Ax(22) = 5 7 7. = 3.2222 3 9 / 5
Exercise Let us use Jacobi method to solve 3x + 2y z = 2x 2y + 4z = 2 x + 2 y z = Hence, 3 A= 2 2 2.5 4, b = 2 2 / 5
Exercise The diagonal part of the matrix is 3 D = 2 /3 /2 D = / and the Jacobi method reads: x(k+) = x(k) + D b Ax(k). {z } v(k) 2 / 5
Exercise Start with x() = = (,, )T such that x() x() = x() + D b Ax() 3 2 2 2 2 4 =+D.5 holds. 22 / 5
Exercise Hence, x() /3 = /2 /3 2 = / Similarly, one may write x(2) /3 3 2 2 = + D 2 2.5.3333 (2) x =.3333.667 /3 4 23 / 5
Exercise Iter 2 3 x.33 -.33 -.5 -.43e6 -.29e59 x2..33..94e6.3e59 x3..7..29e6.87e59 However, by iterating one may notice that the solution does not converge. This happens even by changing the initial condition. The question is why this happens? 24 / 5
Convergence From x(k+) = x(k) + D b Ax(k) = F (x(k) ) we may conclude that this is one kind of fixed point iteration scheme with the Lipschitz constant F (x) = D A Since A=D +L+U with R = L + U is the rest of the matrix F (x) = D (D + R) = D R 25 / 5
Convergence To check if the method is convergent, one has to check contractivity q = sup kf k = sup kd Rk2 <. The last relation is equivalent to the condition for the spectral radius def ρ = max( λi ) < i where λi are the eigenvalues of D R. 26 / 5
Exercise In the first example one has D R =.743.5 Hence, ρ =.5976 < 27 / 5
Exercise In the second example one has.6667 D R =...5.3333 2. Hence, ρ =.4 > 28 / 5
Can we make this conclusion apriori? If matrix A is strictly or irreducibly diagonally dominant, Jacobi method will converge. Strict row diagonal dominance means that for each row, the absolute value of the diagonal term is greater than the sum of absolute values of other terms: aii > aij. j6=i This stems from (see Golub/van Loan Chap..) ρ(d (D + R)) D (D + R) = max i aij < aii j6=i 29 / 5
Gauss-Seidel method Going back to the general scheme x(k+) = x(k) + C b Ax(k). we may take the approximation for C to be equal C := L + D 2 3 4 5 Then it follows 6 x(k+) = x(k) + (L + D) b Ax(k). 7 8 9 2 3 4 5 6 nz = 55 7 8 9 3 / 5
Gauss-Seidel method However, we said before that inversion is avoided in numerical schemes. Due to this reason, let us multiply both sides of equation by (L + D): (L + D)x(k+) = (L + D)x(k) + b Ax(k). Having that A = L + D + U we obtain (L + D)x(k+) = (L + D)x(k) + b (L + D + U)x(k) = b Ux(k) Splitting the left side of equation, one obtains Dx(k+) = b Lx(k+) Ux(k). 3 / 5
Gauss-Seidel method In component form one obtains (k+) aii xi = bi i (k+) aij xj j= and (k+) xi n (k) aij xj, i n (k+) (k) = bi aij xj aij xj aii j= i =,..., n. j=i+ i =,..., n. j=i+ 32 / 5
Compare Jacobi and GS method The Jacobi method in element-wise form reads: n (k+) (k) = xi aij xj, bi aii j= i =,..., n. i6=j and Gauss-Seidel: (k+) xi = bi aii i j= (k+) aij xj n (k) aij xj i =,..., n. j=i+ More motivating view on GS: It is Jacobi method while using the latest infos. 33 / 5
Exercise Let us use Gauss-Seidel method to solve 2x + y = 5x + 7y = 3 by starting at x() = (; )T. Hence, 2 A=, b= 5 7 3 34 / 5
Exercise The first iteration reads n i () () () aij xj xi = bi aij xj aii i =,..., n. j=i+ j= 2 () x = () [b aj xj ] = 5 a j=2 () x2 = () [b2 a2j xj ] =.743 a22 j= It takes approximately the same number of iterations as Jacobi to get the solution. 35 / 5
Exercise Let us use Gauss-Seidel method to solve 3x + 2y z = 2x 2y + 4z = 2 x + 2 y z = Hence, 3 A= 2 2 2.5 4, b = 2 36 / 5
Exercise The first iteration reads n i () () () bi aij xj aij xj xi = aii j= j=i+ 3 () x = () [b aj xj ] =.333 a j=2 () x2 = 3 j= j=3 () () [b2 a2j xj a2j xj ] =.333 a22 2 () x3 () = [b3 a3j xj ] =.333 a33 j= 37 / 5
Exercise Similarly to the Jacobi method, the Gauss-Seidel does not converge for this system of equations. 38 / 5
Convergence From x(k+) = x(k) + (L + D) b Ax(k). we may conclude that this is one kind of fixed point iteration scheme with the Lipschitz constant F (x) = (L + D) A Since A=L+D +U one has F (x) = (L + D) (L + D + U) = (L + D) U 39 / 5
Convergence Thus, the method is convergent if the spectral radius of ρ((l + D) U) < The convergence properties of the Gauss Seidel method are dependent on the matrix A. Namely, the procedure is known to converge if either: A is symmetric positive-definite (symmetric A = AT, positive definite matrix is a symmetric matrix A for which all eigenvalues are positive). A is strictly or irreducibly diagonally dominant. 4 / 5
Convergence In the first example one has that ρ((l + D) U) =.357 < and in the second example ρ((l + D) U) =.24 > 4 / 5
Successive over relaxation method (SOR) The SOR-scheme tries to improve the convergence properties of the Gauß-Seidel method. The idea is to introduce a relaxation parameter ω > such that C := D +L ω Inserting yields (k+) x Multiplying by (k) =x + D +L ω b Ax(k). D ω + L, and using A = L + D + U one obtains D + L x(k+) = Dx(k) + b Ux(k) ω ω 42 / 5
SOR method Then and ω Dx(k+) = Dx(k) Lx(k+) + b Ux(k) ω ω Dx(k+) = ( ω)dx(k) + ω b Lx(k+) Ux(k). The representation with components is now given by (k+) aii xi = ( (k) ω)aii xi +ω bi i (k+) aij xj j= n! (k) aij xj and (k+) xi = ( (k) ω)xi ω + aii bi i (k+) aij xj j=, j=i+ n! (k) aij xj. j=i+ 43 / 5
SOR method Note that when ω = the method Dx(k+) = ( ω)dx(k) + ω b Lx(k+) Ux(k). becomes Dx(k+) = b Lx(k+) Ux(k). which is Gauß-Seidel method. 44 / 5
SOR: how to choose ω? With an optimal choice of ω we can improve the convergence of the SOR-method in comparison with the Gauß-Seidel method. But the optimal value for ω depends on the problem, i.e. in general one has to try several values. One should start with larger values, i.e. ω =.7, then ω =.3... and take a look on the convergence behaviour. 45 / 5
Convergence From x(k+) = x(k) + D +L ω b Ax(k) we may conclude that this is one kind of fixed point iteration scheme with the Lipschitz constant F (x) = D +L A ω and hence the spectral radius must satisfy D +L A) < ρ( ω to get convergence. 46 / 5
Convergence For ω =.7 the spectral radius in the previous two examples becomes ρ( D +L A) =.7 < ω and in the second example ρ( D +L ω A) = 3.8 > 47 / 5
General convergence results The equation x(k+) = x(k) + C b Ax(k). for the general iteration method can be transformed into x(k+) = x(k) + C b Ax(k) = C b + (I C A)x(k) i.e. x(k+) = Mx(k) + g in which M = I C A and g = C b. This equation is often called normal form. Every linear iteration method can be written in normal form. The matrix M is called iteration matrix of the method. 48 / 5
General convergence results For the derivation of convergence criteria we need the spectral radius of a matrix which can be defined in the following way: Definition For a given matrix M Rn,n let λi, i =,..., n be the eigenvalues of M. Then ρ(m) := max λi i=,...,n is called the spectral radius of M. 49 / 5
General convergence results Theorem The linear iteration method with the iteration matrix M converges for an arbitrary start vector x() if and only if ρ(m) < and if in a dedicated matrix norm k k, the condition kmk is satisfied. In this case the following error estimates are true: x(k) x x(k) x kmkk x() x(), kmk kmk x(k) x(k ). kmk () (2) Inequality () is called a priori estimate and (2) is called a posteriori estimate. 5 / 5
Literature C. T. Kelley, Iterative Methods for Linear and Nonlinear Equations Yousef Saad, Iterative methods for sparse linear systems Martin Stoll, Solving Linear Systems using the Adjoint 5 / 5