Chapter 3 - From Gaussian Elimination to LU Factorization

Chapter 3 - From Gaussian Elimination to LU Factorization Maggie Myers Robert A. van de Geijn The University of Texas at Austin Practical Linear Algebra Fall 29 http://z.cs.utexas.edu/wiki/pla.wiki/ 1

Gaussian Elimination - Take 1 http://z.cs.utexas.edu/wiki/pla.wiki/ 2

Consider the system of linear equations 2x + 4y 2z = 1 4x 2y + 6z = 2 6x 4y + 2z = 18 Notice that x, y, and z are just variables, for which we can pick any name we want. To be consistent with the notation we introduced previously for naming components of vectors, we use the names χ, χ 1, and and χ 2 instead of x, y, and z, respectively: 2χ + 4χ 1 2χ 2 = 1 4χ 2χ 1 + 6χ 2 = 2 6χ 4χ 1 + 2χ 2 = 18 http://z.cs.utexas.edu/wiki/pla.wiki/ 3

2χ + 4χ 1 2χ 2 = 1 4χ 2χ 1 + 6χ 2 = 2 6χ 4χ 1 + 2χ 2 = 18 Solving this linear system relies on the fact that its solution does not change if 1 Equations are reordered (not actually used in this example); and/or 2 An equation in the system is modified by subtracting a multiple of another equation in the system from it; and/or 3 Both sides of an equation in the system are scaled by a nonzero. http://z.cs.utexas.edu/wiki/pla.wiki/ 4

Example: Gaussian Elimination The following steps are knows as Gaussian elimination. They transform a system of linear equations to an equivalent upper triangular system of linear equations: Subtract λ 1 = (4/2) = 2 times the first equation from the second equation: Before After 2χ + 4χ 1 2χ 2 = 1 4χ 2χ 1 + 6χ 2 = 2 6χ 4χ 1 + 2χ 2 = 18 2χ + 4χ 1 2χ 2 = 1 1χ 1 + 1χ 2 = 4 6χ 4χ 1 + 2χ 2 = 18 http://z.cs.utexas.edu/wiki/pla.wiki/ 5

Subtract λ 2 = (6/2) = 3 times the first equation from the third equation: Before After 2χ + 4χ 1 2χ 2 = 1 1χ 1 + 1χ 2 = 4 6χ 4χ 1 + 2χ 2 = 18 2χ + 4χ 1 2χ 2 = 1 1χ 1 + 1χ 2 = 4 16χ 1 + 8χ 2 = 48 Subtract λ 21 = (( 16)/( 1)) = 1.6 times the second equation from the third equation: Before After 2χ + 4χ 1 2χ 2 = 1 1χ 1 + 1χ 2 = 4 16χ 1 + 8χ 2 = 48 2χ + 4χ 1 2χ 2 = 1 1χ 1 + 1χ 2 = 4 8χ 2 = 16 http://z.cs.utexas.edu/wiki/pla.wiki/ 6

This now leaves us with an upper triangular system of linear equations. Multipliers In the above Gaussian elimination procedure, λ 1, λ 2, and λ 21 are called the multipliers. http://z.cs.utexas.edu/wiki/pla.wiki/ 7

Back substitution 2χ + 4χ 1 2χ 2 = 1 1χ 1 + 1χ 2 = 4 8χ 2 = 16 Solve last equation: χ 2 = 16/( 8) = 2. Substitute χ 2 = 2 into second equation and solve: χ 1 = (4 1(2))/( 1) = 2. Substitute χ 2 = 2 and χ 1 = 2 into first equation and solve: χ = ( 1 (4( 2) + ( 2)( 2)))/2 = 1. Thus, the solution is the vector x = χ χ 1 χ 2 = 1 2 2. http://z.cs.utexas.edu/wiki/pla.wiki/ 8

Gaussian Elimination - Take 2 http://z.cs.utexas.edu/wiki/pla.wiki/ 9

It becomes very cumbersome to always write the entire equation. The information is encoded in the coefficients in front of the χ i variables, and the values to the right of the equal signs. We could just let 2 4 2 1 4 2 6 2 6 4 2 18 represent 2χ + 4χ 1 2χ 2 = 1 4χ 2χ 1 + 6χ 2 = 2 6χ 4χ 1 + 2χ 2 = 18 Then Gaussian elimination can simply work with this array of numbers. http://z.cs.utexas.edu/wiki/pla.wiki/ 1

Initial system of equations: Subtract λ 1 = (4/2) = 2 times the first row from the second row: Subtract λ 2 = (6/2) = 3 times the first row from the third row: Subtract λ 21 = (( 16)/( 1)) = 1.6 times the second row from the third row: 2 4 2 1 4 2 6 2 6 4 2 18 2 4 2 1 1 1 4 6 4 2 18 2 4 2 1 1 1 4 16 8 48 2 4 2 1 1 1 4 8 16 http://z.cs.utexas.edu/wiki/pla.wiki/ 11

Back substitution 2 4 2 1 1 1 4 8 16 The last row is shorthand for 8χ 2 = 16 which implies χ 2 = ( 16)/( 8) = 2 The second row is shorthand for 1χ 1 + 1χ 2 = 4 which implies 1χ 1 + 1(2) = 4 and hence χ 1 = (4 1(2))/( 1) = 2 The first row is shorthand for 2χ + 4χ 1 2χ 2 = 1 which implies 2χ + 4( 2) 2(2) = 1 and hence χ = ( 1 4( 2) + 2(2))/(2) = 1 1 1 χ 1 Solution equals x = @ χ 1 χ 2 A = @ 2 2 A Check the answer (by plugging χ = 1, χ 1 = 2, and χ 2 = 2 into the original system) http://z.cs.utexas.edu/wiki/pla.wiki/ 2(1) + 4( 2) 2(2) = 1 4(1) 2( 2) + 6(2) = 2 6(1) 4( 2) + 2(2) = 18

Observations The above discussion motivates storing only the coefficients of a linear system (the numbers to the left of the ) as a two dimensional array and the numbers to the right as a one dimension array. We recognize this two dimensional array as a matrix: A R m n is the two dimensional array of scalars α, α,1 α,n 1 α 1, α 1,1 α 1,n 1 A =......, α m 1, α m 1,1 α m 1,n 1 where α i,j R for i < m and j < n. It has m rows and n columns. Note that the parentheses are simply there to delimit the array rather than having any special meaning. http://z.cs.utexas.edu/wiki/pla.wiki/ 13

Observations (continued) We similarly recognize that the one dimensional array is a (column) vector x R n where x = The length of the vector is n. χ χ 1. χ n 1. Now, given A R m n and vector x R n, the notation Ax stands for α, χ + α,1 χ 1 + + α,n 1 χ n 1 α 1, χ + α 1,1 χ 1 + + α 1,n 1 χ n 1.... α m 1, χ + α m 1,1 χ 1 + + α m 1,n 1 χ n 1 http://z.cs.utexas.edu/wiki/pla.wiki/ 14

Gaussian Elimination - Take 3 http://z.cs.utexas.edu/wiki/pla.wiki/ 15

Example 2 1 1 2 4 2 4 2 6 6 4 2 = 2 4 2 1 1 6 4 2. http://z.cs.utexas.edu/wiki/pla.wiki/ 16

Exercise Compute 1 3 1 2 4 2 1 1 6 4 2 How can this be described as an axpy operation?. http://z.cs.utexas.edu/wiki/pla.wiki/ 17

1 @ 2 1 A 1 1 @ 1 A @ 3 1 1 @ 1 A @ 1.6 1 @ 2 4 2 1 4 2 6 2 6 4 2 18 2 4 2 1 1 1 4 6 4 2 18 2 4 2 1 1 1 4 16 8 48 1 A = 1 A = 1 A = 2 4 2 1 1 @ 4 2 6 2 A 6 4 2 18 1 2 4 2 1 @ 1 1 4 A 6 4 2 18 2 4 2 1 1 @ 1 1 4 A 16 8 48 2 4 2 1 1 @ 1 1 4 A 8 16 http://z.cs.utexas.edu/wiki/pla.wiki/ 18

1 2 1 1 1 3 1 1 1.6 1 2 4 2 4 2 6 6 4 2 2 4 2 2 1 1 6 4 2 2 4 2 2 1 1 3 16 8 = = = 2 4 2 4 2 6 6 4 2 2 4 2 2 1 1 6 4 2 2 4 2 2 1 1 3 16 8 2 4 2 2 1 1 3 1.6 8 http://z.cs.utexas.edu/wiki/pla.wiki/ 19

2 1 1 1 3 1 1 1.6 1 1 2 18 1 4 18 1 4 48 Back substitution as before = = = 1 2 18 1 4 18 1 4 48 1 4 16 http://z.cs.utexas.edu/wiki/pla.wiki/ 2

Gaussian Elimination - Take 4 http://z.cs.utexas.edu/wiki/pla.wiki/ 21

Example 2 1 3 1 2 4 2 4 2 6 6 4 2 = 2 4 2 1 1 16 8 http://z.cs.utexas.edu/wiki/pla.wiki/ 22

1 2 1 3 1 1 1.6 1 2 4 2 4 2 6 6 4 2 2 4 2 2 1 1 3 16 8 = = 2 4 2 4 2 6 6 4 2 2 4 2 2 1 1 3 16 8 2 4 2 2 1 1 3 1.6 8 http://z.cs.utexas.edu/wiki/pla.wiki/ 23

Forward substitution 2 1 3 1 1 1.6 1 1 2 18 1 4 48 Back substitution as before = = 1 2 18 1 4 48 1 4 16 http://z.cs.utexas.edu/wiki/pla.wiki/ 24

Theorem Let ˆL j be a matrix that equals the identity, except that for i > jthe (i, j) elements (the ones below the diagonal in the jth column) have been replaced with λ i,j : ˆL j = I j λ j+1,j 1 λ j+2,j 1........ λ m 1,j 1. Then ˆL j A equals the matrix A except that for i > j the ith row is modified by subtracting λ i,j times the jth row from it. Such a matrix ˆL j is called a Gauss transform. http://z.cs.utexas.edu/wiki/pla.wiki/ 25

Exercise Verify that 1 1.6 1 2 1 3 1 = 2 4 2 4 2 6 6 4 2 2 4 2 1 1 8 and 1 1.6 1 2 1 3 1 1 2 18 = 1 4 16. http://z.cs.utexas.edu/wiki/pla.wiki/ 26

Gaussian Elimination - Take 4 http://z.cs.utexas.edu/wiki/pla.wiki/ 27

Example Consider λ 1 1 λ 2 1 = 2 4 2 4 2 6 6 4 2 2 4 2 4 λ 1 (2) 2 λ 1 (4) 6 λ 1 ( 2) 6 λ 2 (2) 4 λ 2 (4) 2 λ 2 ( 2) How should λ 1 and λ 2 be chosen so that zeroes are introduced below the diagonal in the first column?. Examine 4 λ 1 (2) and 6 λ 2 (2). λ 1 = 4/2 = and λ 2 = 6/2 = 3 have the desired property. http://z.cs.utexas.edu/wiki/pla.wiki/ 28

Example Alternatively, we can write this as ` 1 1 ««2 @ λ1 1 A @ 4 λ 2 1 6 = @ λ1 λ 2 «2 2 + 4 6 «` 1 «4 2 «2 6 A 4 2 ` «4 2 ` 4 2 + λ1 λ 2 2 6 4 2 «1 A To zero the elements below the diagonal in the first column: ( λ1 λ 2 or, equivalently, ( λ1 λ 2 ) = ) ( 4 2 + 6 ( 4 6 ) = ( ) ( 2 /2 = 3 ) ). http://z.cs.utexas.edu/wiki/pla.wiki/ 29

Generalizing this insight Let A () R n n and ˆL () a Gauss transform. Partition ( ) ( ) A () α () 11 a () T 1 a () 21 A (), ˆL () l (). 22 21 I Then ˆL () A () = ( 1 l () 21 I = ( ) ( α () 11 a () T a () 21 A () 22 ) α () 11 a () T a () 21 l() 21 α() 11 A () 22 l 21a () T ). http://z.cs.utexas.edu/wiki/pla.wiki/ 3

Generalizing this insight (continued) α () 11 a () T a () 21 l() 21 α() 11 A () 22 l21a() T!. Choose l () 21 so that a() 21 l() 21 α() 11 = : l () 21 = a() 21 /α() 11. A () 22 A() 22 l() 21 a() T : this is a rank-1 update (ger). Update A (1) := = 1 l () 21 I «α () 11 a () T a () 21 A () 22! α () 11 a () T a () 21 l() 21 α() 11 A () 22 l21a() T! = α (1) 11 a (1) T A (1) 22!. http://z.cs.utexas.edu/wiki/pla.wiki/ 31

Example Consider 1 λ 21 1 = 2 4 2 1 1 16 8 2 4 2 1 1 16 λ 21 ( 1) 8 λ 2 (1) How should λ 21 be chosen? 16 λ 21 ( 1) = so that λ 21 = 16/( 1) = 1.6 has the desired property. Alternatively, we notice that, viewed as a vector, ( λ21 ) = ( 16 ) /( 1). http://z.cs.utexas.edu/wiki/pla.wiki/ 32

Moving on A (1) A (1) a (1) 1 A (1) 2 α (1) 11 a (1) T a (1) 21 A (1) 22, ˆL (1) I 1 l (1) 21 I. Then I 1 l (1) = 21 I A (1) a (1) 1 A (1) 2 α (1) 11 a (1) T a (1) 21 A (1) 22 A (1) a (1) 1 A (1) 2 α (1) 11 a (1) T a (1) 21 l(1) 21 α(1) 11 A (1) 22 l(1) 21 a(1) T. http://z.cs.utexas.edu/wiki/pla.wiki/ 33

Moving on Now, B @ I 1 l (1) = Choose l (1) 21 B @ 21 I 1 C B A @ A (1) 22 A(1) 22 l(1) 21 a(1) T 1 I A (2) B C B = @ 1 A @ l (1) 21 I A (1) a (1) 1 A (1) 2 α (1) 11 a (1) T a (1) 21 A (1) 22 A (1) a (1) 1 A (1) 2 α (1) 11 a (1) T 1 C A a (1) 21 l(1) 21 α(1) 11 A (1) 22 l(1) 21 a(1) T 1 C A. so that a(1) 21 l(1) 21 α(1) 11 = : l(1) 21 = a(1) 21 /α(1) 11. : this is a rank-1 update (ger). 1 A (1) a (1) 1 A (1) 2 α (1) 11 a (1) T a (1) 21 A (1) 22 C A = B @ A (1) a (1) 1 A (1) 2 α (1) 11 a (1) T A (2) 22 1 C A http://z.cs.utexas.edu/wiki/pla.wiki/ 34

More general yet A (k) A (k) a (k) 1 A (k) 2 α (k) 11 a (k) T a (k) 21 A (k) 22, ˆL (k) where A (k) and I k are k k matrices. Then I k 1 l (k) = 21 I A (k) a (k) 1 A (k) 2 α (k) 11 a (k) T a (k) 21 A (k) 22 A (k) a (k) 1 A (k) 2 α (k) 11 a (k) T I k 1 l (k) 21 I a (k) 21 l(k) 21 α(k) 11 A (k) 22 l(k) 21 a(k) T., http://z.cs.utexas.edu/wiki/pla.wiki/ 35

B @ I k 1 l (k) = Choose l (k) 21 B @ 21 I A (k) 22 A(k) 22 l(k) 1 C B A @ A (k) a (k) 1 A (k) 2 α (k) 11 a (k) T a (k) 21 A (k) 22 A (k) a (k) 1 A (k) 2 α (k) 11 a (k) T 1 C A a (k) 21 l(k) 21 α(k) 11 A (k) 22 l(k) 21 a(k) T 1 C A. so that a(k) 21 l(k) 21 α(k) 11 = : l(k) 21 = a(k) 21 /α(k) 11. A (k+1) = 21 a(k) T : B @ I 1 l (k) 21 I = B @ 1 C B A @ A (k) a (k) 1 A (k) 2 α (k) 11 a (k) T a (k) 21 A (k) 22 A (k) a (k) 1 A (k) 2 α (k) 11 a (k) T A (k+1) 22 1 C A 1 C A http://z.cs.utexas.edu/wiki/pla.wiki/ 36

A := GE Take5 (A) «AT L A T R Partition A A BL where A T L is while m(a T L) < m(a) do Repartition AT L A T R A BL A BR where α 11 is 1 1 «@ A BR A a 1 A 2 a T 1 α 11 a T A 2 a 21 A 22 a 21 := a 21/α 11 (= l 21) A 22 := A 22 a 21a T (= A 22 l 21a T ) Continue with AT L A T R A BL endwhile A BR «@ A a 1 A 2 a T 1 α 11 a T A 2 a 21 A 22 1 A 1 A http://z.cs.utexas.edu/wiki/pla.wiki/ 37

Insights Now, if A R n n, then A (n) = ˆL (n 1) ˆL (1) ˆL () A = U, an upper triangular matrix. Also, to solve Ax = b, we note that Ux = (ˆL (n 1) ˆL (1) ˆL () A)x = ˆL (n 1) ˆL (1) ˆL () b }{{}. The right-hand size of this we recognize as forward substitution applied to vector b. We will later see that solving Ux = ˆb where U is upper triangular is equivalent to back substitution. ˆb http://z.cs.utexas.edu/wiki/pla.wiki/ 38

The reason why we got to this point as GE Take 5 is so that the reader, hopefully, now recognizes this as just Gaussian elimination. The insights in this section are summarized in the algorithm, in which the original matrix A is overwritten with the upper triangular matrix that results from Gaussian elimination and the strictly lower triangular elements are overwritten by the multipliers. http://z.cs.utexas.edu/wiki/pla.wiki/ 39

Gaussian Elimination - Take 6 http://z.cs.utexas.edu/wiki/pla.wiki/ 4

Inverse of a Matrix Let A R n n and B R n n have the property that AB = BA = I. Then B is said to be the inverse of matrix A and is denoted by A 1. Later we will see that for square A and B it is always the case that if AB = I then BA = I and that the inverse of a matrix is unique. http://z.cs.utexas.edu/wiki/pla.wiki/ 41

Example Let Then ˆL = @ LˆL = @ 2 1 1 2 1 1 1 1 A @ A and L = @ 2 1 1 1 2 1 1 A = @ 1 A. 1 1 This should be intuitively true: ˆLA subtracts two times the first row from the second row. LA adds two times the first row from the second row. LˆLA = L(ˆLA) = A. Why? Two transformations that always undo each other are inverses of each other. 1 A. http://z.cs.utexas.edu/wiki/pla.wiki/ 42

Exercise Compute 2 1 3 1 2 1 3 1 and reason why this should be intuitively true. http://z.cs.utexas.edu/wiki/pla.wiki/ 43

Similarly http://z.cs.utexas.edu/wiki/pla.wiki/ ˆLL = I. (Notice that when 44 we use I without indicating Theorem If ˆL = I k 1 l 21 I then L = I k 1 l 21 I is its inverse: LˆL = ˆLL = I. Proof ˆLL = = I k 1 l 21 I I k 1 l 21 + Il 21 I I k 1 l 21 I = I k 1 I = I.

Exercise Recall that 1 1.6 1 = 2 1 3 1 2 4 2 1 1 8 2 4 2 4 2 6 6 4 2. Show that 2 4 2 4 2 6 6 4 2 = 2 1 3 1 1 1.6 1 http://z.cs.utexas.edu/wiki/pla.wiki/ 45 2 4 2 1 1 8.

Exercise Show that so that 2 1 3 1 2 4 2 4 2 6 6 4 2 1 1.6 1 = 2 1 3 1.6 1 = 2 1 3 1.6 1 2 4 2 1 1 8. http://z.cs.utexas.edu/wiki/pla.wiki/ 46

Theorem Let ˆL (),, ˆL (n 1) be the sequence of Gauss transforms that transform an n n matrix A to an upper triangular matrix: Then ˆL (n 1) ˆL () A = U. A = L () L (n 2) L (n 1) U, where L (j) = ˆL (j) 1, the inverse of ˆL (j). http://z.cs.utexas.edu/wiki/pla.wiki/ 47

Proof If ˆL (n 1) ˆL (n 2) ˆL () A = U. then ˆL (n 2) A = L () L (n 2) L } (n 1) {{ ˆL (n 1) } ˆL () } I {{ } } I {{ } I = L () L (n 2) L (n 1) } ˆL (n 1) ˆL (n 2) {{ L () A} U = L () L (n 2) L (n 1) U. A http://z.cs.utexas.edu/wiki/pla.wiki/ 48

Lemma Let ˆL (),..., ˆL (n 1) be the sequence of Gauss transforms that transforms a matrix A into an upper triangular matrix U: and let L (j) = ˆL (j) 1. Then has the structure ˆL (n 1) ˆL () A = U L (k) = L () L (k 1) L (k) L (k) = ( L (k) T L L (k) BL where L (k) T L is a (k + 1) (k + 1) unit lower trianglar matrix. I ) http://z.cs.utexas.edu/wiki/pla.wiki/ 49

Proof Proof by induction on k. Base case: k =. L () = L () = ( 1 l () 21 I ) meets the desired criteria since 1 is a trivial unit lower triangular matrix. http://z.cs.utexas.edu/wiki/pla.wiki/ 5

Inductive step: Assume L (k) meets the indicated criteria. We will show that then L (k+1) does too. Let 1! L L (k) L (k) (k) T L B = L (k) = @ BL I l(k) T C 1 1 A L (k) T 2 I where L T L (and hence L ) are unit lower triangular matrices of dimension (k + 1) (k + 1). Then 1 1 L (k) I k+ L (k+1) = L (k) L (k+1) B = @ l(k) T C B C 1 1 A @ 1 A L (k) T 2 I l (k+1) 21 I 1! 1 L (k) L (k)! B = @ l(k) T C 1 1 A = B l (k) T (k+1) @ 1 1 C A = L T L L L (k+1), (k) T 2 l (k+1) 21 I L (k) T 2 l (k+1) BL I I 21 which meets the desired criteria since L (k+1) T L triangular. is unit lower http://z.cs.utexas.edu/wiki/pla.wiki/ 51

By the Principle of Mathematical Induction the result holds for L (j), j < n 1. http://z.cs.utexas.edu/wiki/pla.wiki/ 52

Corollary Under the conditions of the Lemma L = L (n 1) is a unit lower triangular matrix the strictly lower triangular part of which is the sum of all the strictly lower triangular parts of L (),..., L (n 1) : L = L (n 1) = 1 1 1 l (2) 21... l (n 2) 21 1 l () 21. (Note that l (n 1) 21 is a vector of length zero, so that the last step of involving L (n 1) is really a no op.) http://z.cs.utexas.edu/wiki/pla.wiki/ 53

Example A consequence of this corollary is that the fact that 2 1 1 = 2 1 3 1 1.6 1 3 1.6 1 in a previous Exercise is not a coincidence: For these matrices all you have to do find the strictly lower triangular parts of the right-hand side is to move the nonzeroes below the diagonal in the matrices on the left-hand side to the corresponding elements in the matrix on the right-hand side of the equality sign. http://z.cs.utexas.edu/wiki/pla.wiki/ 54

Exercise The order in which the Gauss transforms appear is important. In particular, verify that 1 1.6 1 2 1 3 1 2 1 3 1.6 1. http://z.cs.utexas.edu/wiki/pla.wiki/ 55

Theorem Let ˆL (),..., ˆL (n 1) be the sequence of Gauss transforms that transforms an n n matrix A into an upper triangular matrix U: ˆL (n 1) ˆL () A = U and let L (j) = ˆL (j) 1. Then A = LU, where L = L () L (n 1) is a unit lower triangular matrix and can be easily obtained from ˆL (),..., ˆL (n 1) by the observation summarized in the last Corollary. Note Notice that the Theorem does not say that for every square matrix Gaussian elimination is well-defined. It merely says that if Gaussian elimination as presented thus far completes, then there is a unit lower triangular matrix L and upper triangular matrix U such that A = LU. http://z.cs.utexas.edu/wiki/pla.wiki/ 56