Lecture 2 INF-MAT : , LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky

Lecture 2 INF-MAT 4350 2009: 7.1-7.6, LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky Tom Lyche and Michael Floater Centre of Mathematics for Applications, Department of Informatics, University of Oslo August 27, 2009

Triangular Matrices (from week 1) Recall: The product of two upper(lower) triangular matrices is upper(lower) triangular. A triangular matrix is nonsingular if and only if all diagonal elements are nonzero. The inverse of two upper(lower) triangular matrices is upper(lower) triangular. A matrix is unit triangular if it is triangular with 1 s on the diagonal. The product of two unit upper(lower) triangular matrices is unit upper(lower) triangular. A unit upper(lower) triangular matrix is invertible and the inverse is unit upper(lower) triangular.

LU Factorization We say that A = LR is an LU factorization of A R n,n if L R n,n is lower (left) triangular and R R n,n is upper (right) triangular. In addition we will assume that L is unit triangular. Example A = [ ] 2 1 = 1 2 [ 1 0 1/2 1 ] [ ] 2 1 0 3/2

Example Not every matrix has an LU factorization. An LU factorization of A = [ ] 0 1 1 1 must satisfy the equations [ ] [ ] [ ] 0 1 1 0 r1 r = 3 1 1 l 1 1 0 r 2 for the unknowns l 1 in L and r 1, r 2, r 3 in R. Get equations [ ] [ ] 0 1 r1 r = 3 1 1 l 1 r 1 l 1 r 3 + r 2 Comparing (1, 1)-elements we see that r 1 = 0, this makes it impossible to satisfy the condition 1 = l 1 r 1 for the (2, 1) element. We conclude that A has no LU factorization.

Submatrices A C n,n r = [r 1,..., r k ] for some 1 r 1 < < r k n Principal: B = A(r, r), b i,j = a ri,r j Leading principal: B = A k := A(1 : k, 1 : k) The determinant of a (leading) principal submatrix is called a (leading) principal minor. [ 1 2 3 ] The principal submatrices of A = 4 5 6 are 7 8 9 [1], [5], [9], [ 1 2 4 5 ], [ 1 3 7 9 ], [ 5 6 8 9 ], A. The leading principal submatrices are [1], [ 1 2 4 5 ], A.

Existence and Uniqueness of LU Theorem Suppose the leading principal submatrices A k of A C n,n are nonsingular for k = 1,..., n 1. Then A has a unique LU factorization. Example [ ] 1 1 = 0 0 [ 1 0 0 1 ] [ ] 1 1 0 0

Proof Proof by induction on n n = 1: [a 11 ] = [1][a 11 ]. Suppose that A n 1 has a unique LU factorization A n 1 = L n 1 R n 1, and that A 1,..., A n 1 are nonsingular. Since A n 1 is nonsingular it follows that L n 1 and R n 1 are nonsingular. But then [ ] An 1 b A = = c T a nn [ ] [ ] Ln 1 0 Rn 1 v c T R 1 n 1 1 0 a nn c T R 1 n 1 v = LR, where v = L 1 n 1 b is an LU factorization of A. Since L n 1 and R n 1 are nonsingular the block (2,1) entry in L is uniquely given and then r nn is also determined uniquely from the construction. Thus the LU factorization is unique.

Using block multiplication one can show Lemma Suppose A = LR is the LU factorization of A R n,n. For k = 1,..., n let A k, L k, R k be the leading principal submatrices of A, L, R, respectively. Then A k = L k R k is the LU factorization of A k for k = 1,..., n. Example 1 2 3 1 0 0 1 2 3 A = 4 5 6 = 4 1 0 0 3 6 = LR. 7 8 9 7 2 1 0 0 0 A 1 = [1] = [1][1] = L 1 R 1 [ ] [ ] [ ] 1 2 1 0 1 2 A 2 = = = L 4 5 4 1 0 3 2 R 2 R(3, 3) = 0 and A is singular.

A Converse Theorem Suppose A C n,n has an LU factorization. If A is nonsingular then the leading principal submatrices A k are nonsingular for k = 1,..., n 1 and the LU factorization is unique. Proof: Suppose A is nonsingular with the LU factorization A = LR. Since A is nonsingular it follows that L and R are nonsingular. By Lemma we have A k = L k R k. L k is unit lower triangular and therefore nonsingular. R k is nonsingular since its diagonal entries are among the nonzero diagonal entries of R. But then A k is nonsingular for all k. Moreover uniqueness follows. Remark The LU factorization of a singular matrix need not be unique. For the zero matrix any unit lower triangular matrix can be used as L in an LU factorization.

Symmetric LU Factorization For a symmetric matrix the LU factorization can be written in a special form. A =» 2 1 = 1 2» 1 0 1/2 1» 2 1 = 0 3/2»» 1 0 2 0 1/2 1 0 3/2» 1 1/2 0 1 In the last product the first and last matrix are transposes of each other. A = LDL T symmetric LU factorization. A = LR where R = DL T Definition Suppose A R n,n. A factorization A = LDL T, where L is unit lower triangular and D is diagonal is called a symmetric LU factorization.

LDLT Characterization Theorem Suppose A R n,n is nonsingular. Then A has a symmetric LU factorization A = LDL T if and only if A = A T and A k is nonsingular for k = 1,..., n 1. The symmetric LU factorization is unique.

Block LU Factorization Suppose A R n,n is a block matrix of the form A 11 A 1m A :=.., (1) A m1 A mm where each (diagonal) block A ii is square. We call the factorization I R 11 R 1m L 21 I R 21 R 2m A = LR =........ (2) L m1 L m,m 1 I R mm a block LU factorization of A. Here the ith diagonal blocks I in L and R ii in R have the same order as A ii.

Block LU The results for elementwise LU factorization carry over to block LU factorization as follows. Theorem Suppose A R n,n is a block matrix of the form (1), and the leading principal block submatrices A 11 A 1k A k :=.. A k1 A kk are nonsingular for k = 1,..., m 1. Then A has a unique block LU factorization (2). Conversely, if A is nonsingular and has a block LU factorization then A k is nonsingular for k = 1,..., m 1.

Why Block LU? The number of flops for the block LU factorization is the same as for the ordinary LU factorization. An advantage of the block method is that it combines many of the operations into matrix operations.

The PLU Factorization A nonsingular matrix A R n,n has an LU factorization if and only if the leading principle submatrices A k are nonsingular for k = 1,..., n 1. This condition seems fairly restrictive. However, for a nonsingular matrix A there always is a permutation of the rows so that the permuted matrix has an LU factorization. We obtain a factorization of the form P T A = LR or equivalently A = PLR, where P is a permutation matrix, L is unit lower triangular, and R is upper triangular. We call this a PLU factorization of A.

Positive (Semi)Definite Matrices Suppose A R n,n is a square matrix. The function f : R n R given by n n f (x) = x T Ax = a ij x i x j i=1 j=1 is called a quadratic form. We say that A is (i) positive definite if x T Ax > 0 for all nonzero x R n. (ii) positive semidefinite if x T Ax 0 for all x R n. (iii) negative (semi)definite if A is positive (semi)definite. (iv) symmetric positive (semi)definite if A is symmetric in addition to being positive (semi)definite. (v) symmetric negative (semi)definite if A is symmetric in addition to being negative (semi)definite.

Observations A matrix is positive definite if it is positive semidefinite and in addition x T Ax = 0 x = 0. (3) A positive definite matrix must be nonsingular. Indeed, if Ax = 0 for some x R n then x T Ax = 0 which by (3) implies [ that x ] = 0. 3 2 is positive definite. 1 2 The zero-matrix is symmetric positive semidefinite, while the unit matrix is symmetric positive definite. The second derivative matrix T = tridiag( 1, 2, 1) R n,n is symmetric positive definite.

Useful Results Theorem Let m, n be positive integers. If A R n,n is positive semidefinite and X R n,m then B := X T AX R m,m is positive semidefinite. If in addition A is positive definite and X has linearly independent columns then B is positive definite. Proof. Let y R m and set x := Xy. Then y T By = x T Ax 0. If A is positive definite and X has linearly independent columns then x is nonzero if y is nonzero and y T By = x T Ax > 0. Taking A := I and X := A we obtain Corollary Let m, n be positive integers. If A R m,n then A T A is positive semidefinite. If in addition A has linearly independent columns then A T A is positive definite.

More Useful Results Theorem Any principal submatrix of a positive (semi)definite matrix is positive (semi)definite. Proof. Suppose the submatrix B is defined by the rows and columns r 1,..., r k of A. Then B := X T AX, where X = [e r1,..., e rk ] R n,k, and B is positive (semi)definite by Theorem 7. If A is positive definite then the leading principal submatrices are nonsingular and we obtain: Corollary A positive definite matrix has a unique LU factorization.

What about the Eigenvalues? Theorem A positive (semi)definite matrix A has positive (nonnegative) eigenvalues. Conversely, if A has positive (nonnegative) eigenvalues and orthonormal eigenvectors then it is positive (semi)definite. Proof. Consider the positive definite case. Ax = λx with x 0 λ = xt Ax x T x > 0. Suppose conversely that A R n,n has eigenpairs (λ j, u j ), j = 1,..., n, where the eigenvalues are positive and the eigenvectors satisfy u T i u j = δ ij, i, j = 1,..., n. Let U := [u 1,..., u n ] R n,n and D := diag(λ 1,..., λ n ). AU = UD and U T U = I U T AU = D. Let x R n be nonzero and define c R n by Uc = x. x T Ax = (Uc) T AUc = c T U T AUc = c T Dc = n j=1 λ jc 2 j > 0. The positive semidefinite case is similar.

What about the Determinant? Theorem If A is positive (semi)definite then det(a) > 0 (det(a) 0). Proof. Since the determinant of a matrix is equal to the product of its eigenvalues this follows from the previous theorem.

The Symmetric Case Lemma If A is symmetric positive semidefinite then for all i, j 1. a ij (a ii + a jj )/2, 2. a ij a ii a jj. Proof. For all i, j and α, β R 0 (αe i + βe j ) T A(αe i + βe j ) = α 2 a ii + β 2 a jj + 2αβa ij, α = 1, β = ±1 = a ii + a jj ± 2a ij 0 = 1. 2. follows trivially from 1. if a ii = a jj = 0. Suppose one of them, say a ii is positive. Taking α = a ij, β = a ii we find 0 a 2 ij a ii + a 2 ii a jj 2a 2 ij a ii = a ii (a ii a jj a 2 ij ). But then a ii a jj aij 2 0 and 2. follows.

A Consequence If A is symmetric positive semidefinite and one diagonal element is zero, say a ii = 0 then all elements in row i and column i must also be zero. For since a ij a ii a jj we have a ij = 0 for all j, and by symmetry a ji = 0 for all j. In particular, if A R n,n is symmetric [ positive ] semidefinite 0 0 T and a 11 = 0 then A has the form, B R n 1,n 1 0 B A 1 = [ ] 0 1, A 1 1 2 = [ ] 1 2, A 2 2 3 = None of them is symmetric positive semidefinite. [ ] 2 1. 1 2

Cholesky Factorization Definition 1. A factorization A = R T R where R is upper triangular with positive diagonal elements is called a Cholesky factorization. 2. A factorization A = R T R where R is upper triangular with non-negative diagonal elements is called a semi-cholesky factorization. Theorem Let A R n,n. 1. A has a Cholesky factorization if and only if it is symmetric positive definite. 2. A has a semi-cholesky factorization if and only if it is symmetric positive semidefinite.

Proof Outline Positive Semidefinite Case If A = R T R is a semi-cholesky factorization then A is symmetric positive semidefinite. Suppose A R n,n is symmetric positive semidefinite. We use induction and partition A as [ ] α v T A =, α R, v R n 1, B R n 1,n 1. v B α = a 11 = e T 1 Ae 1 0. If α = 0 then v = 0. The principal submatrix B is positive semidefinite. By induction [ ] B has a semi-cholesky factorization B = R T 1 R 1. 0 0 T R = is a semi-cholesky factorization of A. 0 R 1

Proof Continued [ ] α v T A =. v B α > 0, β := α: C := B vv T /α is symmetric positive semidefinite. By induction C has a semi-cholesky factorization C = R T 1 R 1. [ β v R := T ] /β is a semi-cholesky factorization of A. 0 R 1

Criteria Symmetric Positive Semidefinite Case Theorem The following is equivalent for a symmetric matrix A R n,n. 1. A is positive semidefinite. 2. A has only nonnegative eigenvalues. 3. A = B T B for some B R n,n. 4. All principal minors are nonnegative.

Criteria Symmetric Positive Definite Case Theorem The following is equivalent for a symmetric matrix A R n,n. 1. A is positive definite. 2. A has only positive eigenvalues. 3. All leading principal minors are positive. 4. A = B T B for a nonsingular B R n,n.

Banded Case Recall that a matrix A has bandwidth d 0 if a ij = 0 for i j > d. (semi) Cholesky factorization preserves bandwidth. Corollary [ ] The Cholesky-factor R := β v T /β has the same bandwidth as A. 0 R 1 Proof. [ ] α v T Suppose A = R v B n,n has bandwidth d 0. Then v T = [u T, 0 T ], where u R d vv T = [ u 0 ] [ u T 0 T ] = [ ] uu T 0 0 0 C = B vv T /α differs from B only in the upper d d corner. C has the same bandwidth as B and A. By induction on n, C = R T 1 R 1, where R 1 has the same bandwidth as C. But then R has the same bandwidth as A.

Towards an Algorithm Since A is symmetric we only need to use the upper part of A. The first row of R is v T /β if α > 0 and zero if α = 0. We store the first row of R in the first row of A and the upper part of C = B vv T /α in the upper part of A(2 : n, 2 : n). The first row of R and the upper part of C can be computed as follows. if A(1, 1) > 0 A(1, 1) = A(1, 1) A(1, 2 : n) = A(1, 2 : n)/a(1, 1) for i = 2 : n A(i, i : n) = A(i, i : n) A(1, i) A(1, i : n) (4)

Cholesky and Semi-Cholesky [bandcholesky] 1. function R=bandcholesky(A,d) 2. n=length(a); 3. for k=1:n 4. if A(k,k)>0 5. kp=min(n,k+d); 6. A(k,k)=sqrt(A(k,k)); 7. A(k,k+1:kp)=A(k,k+1:kp)/A(k,k); 8. for i=k+1:kp 9. A(i,i:kp)=A(i,i:kp)-A(k,i)*A(k,i:kp); 10. end 11. else 12. A(k,k:kp)=zeros(1,kp-k+1); 13. end 14. end 15. R=triu(A);

Comments We overwrite the upper triangle of A with the elements of R. Row k of R is zero for those k where r kk = 0. We reduce round-off noise by forcing those rows to be zero. There are many versions of Cholesky factorizations, see the Golub-VanLoan book The algorithm is based on outer products vv T. An advantage of this formulation is that it can be extended to positive semidefinite matrices.

Banded Forward Substitution [bandforwardsolve] Solves the lower triangular system R T y = b. R is upper triangular and banded with r kj = 0 for j k > d. 1. function y=bandforwardsolve(r,b,d) 2. n=length(b); y=b(:); 3. for k=1:n 4. km=max(1,k-d); 5. y(k)=(y(k)-r(km:k-1,k) *y(km:k-1))/r(k,k); 6. end

Banded Backward Substitution [bandbacksolve] Solves the upper triangular system Rx = y. R is upper triangular and banded with r kj = 0 for j k > d. 1. function x=bandbacksolve(r,y,d) 2. n=length(y); x=y(:); 3. for i=n:-1:1 4. kp=min(n,k+d); 5. x(k)=(x(k)-r(k,k+1:kp)*x(k+1:kp))/r(k,k); 6. end

Number of Flops, Discussion Full matrix: O(n 3 /3) Half of what is needed for Gaussian elimination Banded matrix, bandwidth d: O(nd 2 ) Restricted to positive (semi)-definite matrices Many versions of Cholesky factorization tuned to different machine architectures. Symmetric LU factorization can be used for many symmetric matrices that are not positive definite.