Solving Systems of Linear Equations

Motivation Solving Systems of Linear Equations The idea behind Googles pagerank An example from economics Gaussian elimination LU Decomposition Iterative Methods The Jacobi method Summary

Motivation Systems of linear equations of the form or in components A x= b a 11 x 1 +a 12 x 2 +...+a 1n x n =b 1 a 21 x 1 +a 22 x 2 +...+a 2n x n =b 2... a m1 x 1 +a m2 x 2 +...+a mn x n =b m a 11 a 12... a 1n b1 a A=( 21 a 22... a 2n b,............ a m1 a m2... a mn) b=( m) 2... b Are found in many contexts. For instance we need them to find roots of systems of non-linear equations using the Newton method.

Short Reminder (1) Matrix-vector multiplication A x=[ a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33][ x1 x 3]=[ a11 2 x Matrix-matrix multiplication [a11 a12 a13 a 21 a 22 a 23 x1+a12 x2+a13 x3 a 21 x 1 +a 22 x 2 +a 23 x 3] 3 a 31 x 1 +a 32 x 2 +a 33 x b12 b13... b 21 b 22 b 23 a 31 a 32 a 33][b11 b 31 b 32 b 33]=[... k a ik b kj......]

Short Reminder (2) Graphical illustration of 2d systems Every equation represents a line non-singular singular The following situations are possible Lines cross at exactly one point -> unique solution (nonsingular) Lines are parallel (singular) and don't cross -> no solution Lines overlap completely -> infinitely many solutions In higher d the same situations are possible

Short Reminder (3) Focus on square matrices Some criteria that an n*n matrix is non-singular: The inverse of A (denoted A -1 ) exists det (A)!=0 rank (A) = n For any vector z!=0 it holds Az!=0

#total 1 2 1 Example: How Google Search Works (sort of) In a particular group of people, who is the most popular? Say there are three people: Jane, Charlie, Fred We ask them to list their friends J lists C, C lists J,F, F lists J Could write this as a matrix J C F J 0 1 1 C 1 0 0 F 0 1 0

Example (2) Some people list more than others as friends. To compensate we normalise each column by the sum Now want to find a vector p=(j,c,f) which assigns a popularity to each person Idea: a person's popularity is the weighted sum of the popularities of the people who mention it p=lp J C F J C F ( 0 1/2 1 0) 1 0 0 =L 0 1/2 linking matrix

Example (3) 1 1/2 1/2 1/3 1/3 1/3 A node (person) is the more popular the more people list it as friends less friends these people list more popular these friends are (i.e. Having one very popular good friend can make you more popular than having 10 not so popular friends)

Example (4) This defines a system of linear equations! J C F J C F ( 0 1/2 1 0) 1 0 0 =L 0 1/2 E.g.: j=1/2 c+ f, c=j, f=1/2 c or In this case this is easy to solve Lp= p System is under-determined: (e.g.) set j=1 as a scale, -> c=1, f=1/2, i.e. Charlie and Jane are most popular, Fred is less so.

Remark We could use this to assign a popularity to webpages The friend lists from before would then just refer to the lists of links in web-pages, i.e. How many webpages link to a given page (normalized)? The popularity p could be interpreted as page rank (or we might wish to add contextual information (c) for search queries and not base them purely on popularity e.g. p=lp + c) We might wonder if the system Lp=p always has positive solutions p -> This is ensured by the Frobenius-Perron theorem

Example: Setting Prices in an Economy This is a problem that goes back to Wassily Leontief (who got the Nobel prize for economics in 1973) Suppose in a far away land of Eigenbazistan, in a small country town called Matrixville, there lived a Farmer, a Tailor, a Carpenter, a Coal Miner and Slacker Bob. The Farmer produced food; the Tailor, clothes; the Carpenter, housing; the Coal Miner supplied energy; and Slacker Bob made High Quality 100% Proof Moonshine, half of which he drank himself. Let's assume that: No outside supply or demand Everything produced is consumed. Need to specify what fraction of each good is consumed by each person Problem: What are the incomes of each person?

Consumption Food Clothes Housing Energy High Quality 100 Proof Moonshine Farmer 0.25 0.15 0.25 0.18 0.20 Tailor 0.15 0.28 0.18 0.17 0.05 Carpenter 0.22 0.19 0.22 0.22 0.10 Coal Miner 0.20 0.15 0.20 0.28 0.15 Slacker Bob 0.18 0.23 0.15 0.15 0.50 I.e. The farmer eats 25% of all food, 15% of all clothes, 25% of all housing, 18% of all energy Let's say price levels are p F,p T,p CA,p CO,p S Since this is a closed economy, we require: p F =0.25 p F +0.15 p T +0.25 p CA +0.18 p CO +0.20p S

Again... a System of Linear Equations p F =0.25 p F +0.15 p T +0.25 p CA +0.18 p CO +0.20p S p T =0.15 p F +0.28 p T +0.18 p CA +0.17 p CO +0.05p S p CA =0.22 p F +0.19 p T +0.22 p CA +0.22 p CO +0.10p S p CO =0.20 p F +0.15 p T +0.20 p CA +0.28 p CO +0.15p S p S =0.18 p F +0.23 p T +0.15 p CA +0.15 p CO +0.50p S Again this is a system of the type Ap=p (i.e. Under-determined, can set one price arbitrarily) Not so easy to solve by hand... computer methods would be useful!

Gaussian Elimination One idea to mechanise solving such systems is Gaussian elimination Consider the system: 2x+ y z=8 3x y +2z= 11 2x+ y +2z= 3 Observations: Can multiply equations by constants, swap equations, or add or subtract equations without changing solution Apply such operations to transform system into convenient form? What is a convenient form? -> triangular

Gaussian Elimination (2) Triangular form: 2x+ y z=8 1/3 y+1/3 z=2/3 z=1 (3) If our system had the above form we could just solve it by successive substitution, i.e. (1) (2) (3) z= 1 (substitute into) (2) 1/3 y 1/3=2/ 3 y=3 (substitute into) (3) 2 x+3 ( 1)=8 x=2

How to Transform into Triangular Form? Start with: 2x+ y z=8 (1) 3x y +2z= 11 (2) 2x+ y +2z= 3 (3) Pivot (1). Multiply (2) by 2/3 and add to (1) 2x + y z=8 y 3 + z 3 =2/3 2x+ y +2z= 3 (1) (2') (3) Multiply (3) by 1 and add to (1) 2x + y z=8 y 3 + z (1) 3 =1 (2') 2y+z=5 (3')

How to Transform into Triangular Form? (2) 2x + y z=8 y 3 + z 3 =2/3 2y+z=5 Now multiply (2') by -6 and add to (3') 2x + y z=8 y 3 + z 3 =2/3 z=1 (1) (2') (3') (1) (2') (3'') Matrix is now in triangular form. Can diagonalise it as follows:

Diagonalising... Multiply (2') by 3 and add to (3''), (3'') by -1 add to (1): 2 x+ y =7 y =3 z=1 (1) (2'') (3'') And finally: Multiply (2'') by -1 and add to (1). Finally, scale coefficients: 2 x =4 y =3 z=1 (1') (2'') (3'')

Gaussian Elimination Usually, this is done using only coefficient schemes the augmented matrix Can be used to diagonalise matrices, just append identity matrix and perform all operations on it as well [ 2 1 1 1 0 0 1] 3 1 2 0 1 0 2 1 2 0 0 Coefficient matrix Identity matrix

Example... [ 2 1 1 1 0 0 1] 3 1 2 0 1 0 2 1 2 0 0 [ 2 [ 2 1 1 1 0 0 1] 0 1/2 1 /2 3/ 2 1 0 0 0 1 5 4-4*(2)+(3) 1 1 1 0 0 1] 0 1/2 1 /2 3/2 1 0 3/2*(1)+(2) 0 2 1 1 0 (1)+(2)

[ 2 [ 2 Example (2) 1 1 1 0 0 1] 0 1/ 2 1 /2 3/2 1 0 0 0 1 5 4 1 0 6 4 1 ] (1)-(3) 0 1 0 2 2 1 2*(2)+(3) 0 0 1 5 4 1 [ 2 0 0 8 6 2 ] 0 1 0 2 2 1 0 0 1 5 4 1 [ 1 0 0 4 3 0 1 0 2 2 1 0 0 1 5 4 1] (1)-(2) 1/2*(1) 1*(2) -1*(3) This is A -1

Example (3) OK, we found A -1, how to check if this is correct? Suppose we now have A -1 how is this useful to find x?

Some Remarks Pivot may be zero or ( inconvenient for numerical stability) -> reorder rows Possibly divide by very small numbers (if pivot is small), better reorder to use largest possible pivot Computational complexity is of order O(n 3 ) (roughly: n-1 rows need to be dealt with, this process involves order n operations and needs to be repeated less than n times)

Pseudocode for k = 1... min(m,n): Find the k-th pivot: i_max := argmax (i = k... m, abs(a[i, k])) if A[i_max, k] = 0 error "Matrix is singular!" swap rows(k, i_max) Do for all rows below pivot: for i = k + 1... m: Do for all remaining elements in current row: for j = k + 1... n: A[i, j] := A[i, j] - A[k, j] * (A[i, k] / A[k, k]) Fill lower triangular matrix with zeros: A[i, k] := 0 Turns m*n matrix into echelon formation which can then be solved by back-substitution

LU Decomposition A slightly different strategy is LU factorization Write A as a product of two triangular matrices, one upper (U) and one lower triangular A=LU [a11 a12 a13 a 21 a 22 a 23 0 0 u12 u13 l 21 l 22 0 0 u 22 u 23 a 31 a 32 a 33]=[l11 l 31 l 32 l 33][u11 0 0 u 33] How does this help to solve Ax=b? A x= LU x=b Can first solve Ly=b and then Ux=y

LU Decomposition (2) Ly=b is easy to solve because y 3]=[ l 11 y 1 2 l 21 y 1 +l 22 y 3]=[ 2 y l 31 y 1 +l 32 y 2 +l 33 y [l11 0 0 l 21 l 22 0 l 31 l 32 l 33][ y1 we can solve it by back-substitution Similarly, Ux=y is easy : b1 b 2 b 3] [u11 u12 u13 x1 x1+u12 x2+u13 x3 0 u 22 u 23 x 2 u 22 x 2 +u 23 x 3 x 3]=[u11 u 33 x 3 0 0 u 33][ ]=[ y1 y 2 y 3]

LU Decomposition (3) Essentially variant of Gaussian elimination Like in Gaussian elimination we might have to exchange Rows ( partial pivoting ) PA=LU Rows and columns ( full pivoting ) PAQ=LU Where P is a permutation matrix that keeps accounts for row permutations and Q for column permutations Any square matrix allows an LUP factorization For positive definite symmetric (or Hermitian) matrices there is a similar decomposition, the Cholesky decomposition A=L L * (often used in statistics)

[ Doolittle Algorithm (1) One algorithm to generate LUP Similarly to GE we aim to remove entries of A under the diagonal which is achieved by multiplying A with an appropriate matrix L' from the left, e.g.: [a11 a12 a13 a 21 a 22 a 23 a 31 a 32 a 33] Multiply by -a21/a11 and add to first row Multiply by -a31/a11 and add to first row This is equivalent to multiplying A from the left by 1 0 0 a 21 / a 11 1 1][ 0 a 31 / a 11 0 a11 a12 a13 a12 a13 a 21 a 22 a 23 0 a 21 / a 11 a 12 +a 22 a 21 / a 11 a 13 +a 23 a 31 a 32 a 33]=[a11 0 a 31 / a 11 a 12 +a 32 a 31 / a 11 a 13 +a 33]

Doolittle Algorithm (2) Start with matrix A (0) =A At stage n-1 left multiply A (n-1) with a matrix L n that has diagonal entries 1 and only entries in the nth column under the diagonal. These entries are In N-1 steps obtain an upper triangular matrix We have: Since products and inverses of lower triangular matrices are lower triangular we have (*) L= L 1 1 L 1 1 2... L N 1 and A=LU (n l ) i,n = a (n 1) i, n A=L 1 1 L 1 1 2... L N 1 (n 1) a nn (N 1) A (N 1) U =A

Doolittle (3) Still need to =[ determine L from (*) 1 0 0 0 0... 0 1] 0 1 0 0 0... 0 L n 0... 0 1 0... 0 0... 0 l i, n 0... 0 =[ 0... 0 l N,n 0... 1 0 0 0 0... 0 1] 0 1 0 0 0... 0 L 1 n 0... 0 1 0... 0 0... 0 l i, n 0... 0 0... 0 l N,n 0...

Doolittle (4) And hence: 1 0 0 0 0... 0 l 21 1 0 0 0... 0 l L=[ ] 31 l 32... 1 0... 0 l 41 l 42 l 43 l i n 0... 0..................... l N1 l N2 l N3...... l N N 1 1

Beyond Doolittle... LU decomposition has some slight advantages over Gaussian elimination i.e. once we have L and U we can solve systems for every r.h.s. B Small modifications can also be used to calculate the matrix inverse Generalizations are e.g. the Crout algorithm or LUP decomposition (the wiki pages are very good on this topic and give many more advanced references: http://en.wikipedia.org/wiki/lu_decomposition)

Iteration Methods Time complexity of LU decomposition or Gaussian elimination is O(n 3 ), i.e. dealing with millions of equations (which arise when solving certain PDEs) can take a long time... There are faster methods that only come up with approximate solutions, these are iteration methods The trade-off of these methods is that they might not converge for general types of coefficient matrices A For diagonally dominant methods a widely used method is the Jacobi method

Jacobi Method Want to solve Ax=b (*) Write A=D+R Where D are the diagonal elements of A and R is the rest Diagonally dominant means that D is large compared to R, e.g. a ii > j a ij Write (*) as: A x=(d +R) x= b x= D 1 ( b R x) Iterate this: x n +1 =D 1 ( b R x n ) Iteration #

Example Jacobi Consider the system: 2x+ y=11 5x+ 7y=13 A=[ 2 1 5 7], b=[ 11 13] Jacobi: x n +1 =D 1 ( b R x n ) say: x n +1 = c T x n x 0 =[1,1] T D=[ 2 0 0 7], R= [ 0 1 5 0] D 1 =[ c= D 1 b=[11/2,13/7] T T =D 1 R=[ 0 1/2 5/7 0 ] 1/2 0 0 1 /7]

Example Jacobi (2) Hence we have to iterate: =[ x n+1 With x0=[1,1] x n +1 = c T x n 11/2 13/7] [ 0 1/2] 5/7 0 x n =[ x 11/2 1 13/7] [ 0 1/2 8/7] 5/7 0 ][ 1 1] = [ 5 Iterating 25 times one obtains: x=[7.111, -3.222] Exact -> x=64/9=7.1111..., y=-29/9=-3.22222...

The Gauss-Seidel Method Jacobi method only applicable to diagonally dominant matrices Gauss-Seidel can be used for general matrices, but convergence only guaranteed if Diagonally dominant or Symmetric and positive definite Idea: A=L+U Lower triangular Strictly upper triangular x n +1 =L 1 (b U x n )

Software Packages LINPACK Package for solving wide variety of linear systems and special systems (e.g. symmetric, banded, etc.) Has become standard benchmark for comparing performance of computers LAPACK More recent replacement of LINPACK higher performance on modern computer architectures including some parallel computers Available from Netlib -- http://www.netlib.org/

Summary Exact methods to solve linear systems of equations Gaussian elimination LU decomposition Doolittle algorithm With full/partial pivoting -> in practice stable O(n3 ) Iterative methods e.g. Jacobi method Faster suited to millions of equations that might arise when solving certain PDEs Limited convergence More sophisticated: weighted Jacobi, Gauss-Seidel successive over-relaxation,...

References The English wiki pages are quite comprehensive on this topic (and I used them quite a bit for preparing the lecture) More comprehensive: David M. Young, Jr. Iterative Solution of Large Linear Systems, Academic Press, 1971. (reprinted by Dover, 2003) Richard S. Varga 2002 Matrix Iterative Analysis, Second ed. (of 1962 Prentice Hall edition), Springer-Verlag.