The Singular Value Decomposition and Least Squares Problems

The Singular Value Decomposition and Least Squares Problems Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 27, 2009

Applications of SVD solving over-determined equations statistics, principal component analysis numerical determination of the rank of a matrix search engines (Google,...) theory of matrices and lots of other applications...

Diagonalization A square matrix A can be diagonalized by a unitary similarity transformation if and only if it is normal. U H AU = D := diag(λ 1,..., λ n ) or A = UDU H. If U = [u 1,..., u n ] then Au j = λ j u j, u H j u k = δ jk. If A is real and symmetric then U and D are real. Today: Any matrix, even a rectangular one, can be diagonalized provided we allow two different unitary matrices. A = UΣV H is called a Singular Value Decomposition (SVD) if Σ is a diagonal matrix of the same dimension as A, and U and V are square and unitary. The diagonal entries of Σ are called the singular values of the matrix.

Hermitian matrix Recall Theorem Suppose A C n,n is Hermitian. Then A has real eigenvalues λ 1,..., λ n. Moreover, there is a unitary matrix U C n,n such that U H AU = diag(λ 1,..., λ n ). For the columns {u 1,..., u n } of U we have Au j = λ j u j for j = 1,..., n. Thus {u 1,..., u n } are orthonormal eigenvectors of A.

Eigenvalues of A H A Lemma Suppose m, n N and A C m,n. The matrix A H A has eigenpairs (λ j, v j ) for j = 1,..., n, where v H j v k = δ jk and λ 1 λ 2 λ n 0. Moreover σ j := λ j = Av j 2, for j = 1,..., n. (1) Proof: A H A C n,n is Hermitian. It has real eigenvalues λ j and orthonormal eigenvectors v j. Av j 2 2 = (Av j ) H Av j = v H j A H Av j = v H j λ j v j = λ j 0.

Definition (Singular Values) The positive square roots σ j := λ j, j = 1,..., n of the eigenvalues of A H A are called the singular values of A C m,n. σ 1 σ 2 σ r > 0 = σ r+1 = = σ n. [ ] Σ1 0 Σ := r,n r R 0 m r,r 0 m,n, m r,n r Σ 1 := diag(σ 1,..., σ r ), 0 k,l = [ ], the empty matrix, if k = 0 or l = 0. Σ = [σ 1 e 1,..., σ r e r, 0,..., 0]. We will show that r is the rank of A.

Examples [ ] [ ] 11 48 3 0 A := 1, Σ = = Σ 25 48 39 0 1 1. 14 2 2 0 [ ] [ ] A := 1 4 22, Σ = 0 1 Σ1 2 0 =, Σ 15 0 1 =. 16 13 0 0 1,2 0 1 [ ] [ ] [ ] 14 4 16 2 0 0 2 0 A 1 := 1, Σ =, Σ 15 2 22 13 0 1 0 1 =. 0 1 1 1 [ ] A = 1 1 Σ1 0 Σ =, Σ 0 0 1 = [2]. 0 0

The Singular Value Decomposition Theorem (Existence of SVD) Let m, n N and suppose A C m,n has r nonzero singular values σ 1 σ r > 0 = σ r+1 = = σ n. Then A has the singular value decomposition A = UΣV H, where U C m,m and V C n,n are unitary, and [ Σ1 0 Σ = r,n r 0 m r,r 0 m r,n r ], Σ 1 = diag(σ 1,..., σ r ). (2) If A is real then A = UΣV T, where U R m,m and V R n,n are orthonormal, and Σ is given by (2).

A useful result From the eigenvectors of A H A we can derive orthonormal bases for the column space span(a) and null space ker(a) of A. Theorem Suppose A C m,n and let (σ 2 j, v j) for j = 1,..., n be orthonormal eigenpairs for A H A. Suppose r of the singular values are nonzero so that σ 1 σ r > 0 = σ r+1 = = σ n. (3) Then {Av 1,..., Av r } is an orthogonal basis for the column space of A and {v r+1,..., v n } is an orthonormal basis for the nullspace of A.

Part of Proof For j k (Av j ) H Av k = v H j A H Av k = v H j λ k v k = 0, and {Av 1,..., Av n } is an orthogonal set. σ j = Av j 2 (Av j 0 j {1,..., r}) But then span([av 1,..., Av r ]) span(a) and span([v r+1,..., v n ]) ker(a). For equality we need to show the opposite inclusions.

Outline of existence proof V := [v 1,..., v n ] C n,n where A H Av j = λ j v j and {v 1,..., v n } is orthonormal. σ j := λ j for j = 1,..., n. u j := Av j Av j 2 = 1 σ j Av j, for j = 1,..., r. U := [u 1,..., u m ] C m,m, where {u r+1,..., u m } is defined by extending {u 1,..., u r } to an orthonormal basis for C m. UΣ = U[σ 1 e 1,..., σ r e r, 0,..., 0] = [σ 1 u 1,..., σ r u r, 0,..., 0] = [Av 1,..., Av n ] = AV. Since V is unitary we find UΣV H = AVV H = A.

Uniqueness The singular values are unique The matrices U and V are in general not unique.

Examples [ ] [ ] [ ] [ ] 1 11 48 3 4 3 0 = 25 48 39 1 1 3 4. 5 4 3 0 1 5 4 3 [ ] [ ] [ ] 1 2 2 1 14 4 16 3 4 2 0 0 = 15 2 22 13 1 1 2 2 1 5 4 3 0 1 0 3 2 1 2 14 2 1 2 2 2 0 [ ] 1 4 22 = 1 2 2 1 0 1 3 4 1 15 3 5 4 3 16 13 2 1 2 0 0 1 1 1/ 2 1/ 2 0 1 1 = 1/ 2 1/ 2 0 [ ] 2 0 0 0 1 1 1 2 1 1 0 0 0 0 1 0 0

r < n < m Find the singular value decomposition of 1 1 A = 1 1. 0 0 [ ] 2 2 B := A T A = 2 2 [ ] [ ] 1 1 B = 4, B 1 1 [ ] 1 = 0 1 [ ] 1 1 u 1 = Av 1 /σ 1 = s 1 / 2, where s 1 = [1, 1, 0] T. Extend s 1 to a basis {s 1, s 2, s 3 } for R 3 Apply Gram-Schmidt to {s 1, s 2, s 3 }

Comments on computing the SVD The method we used to find the singular value decomposition in the previous example can be suitable for hand calculation with small matrices, but it is not appropriate as a general purpose numerical method. In particular, the Gram-Schmidt orthogonalization process which can be used to extend u 1,... u r to an orthonormal basis is not numerically stable. Forming A H A can lead to extra errors in the computation. State of the art computer implementations of the singular value decomposition uses an adapted version of the QR-algorithm where the matrix A H A is not formed. The QR-algorithm is discussed in Chapter 20.

SVD using Matlab [U,S,V]=svd(A), the singular value decomposition s=svd(a) the singular values [U,S,V]=svd(A,0), economy size, If if m > n then U C m,n, S R n,n, and V C n,n as before.

The Singular Value Factorization (SVF) SVD: A = UΣV H, U, V square. U = [u 1,..., u m ] = [U 1, U 2 ], U 1 R m,r, U 2 R m,m r, V = [v 1,..., v n ] = [V 1, V 2 ], V 1 R n,r, V 2 R n,n r, A = [ ] [ ] [ ] Σ U 1, U 1 0 V H 1 2 0 0 V H = U 1 Σ 1 V H 1 2 singular value factorization.

Three forms of SVD m n m n n r r n V H Σ H 1 V 1 A = m U Σ = U 1 = σ 1 u 1 v 1 H +...+σr u r v r H A = UΣV H, SVD A = U 1 Σ 1 V H 1, SVF r = σ i u i v H i, outer product form. i=1

Normal and positive semidefinite matrices σ j = λ j if A H A = AA H. σ j = λ j if A is symmetric positive semidefinite. Spectral decomposition A = UDU H, D = diag(λ 1,..., λ n ) and U H U = I. A H A = UD H DU H, D H D = diag( λ 1 2,..., λ n 2 ) For a positive semi-definite matrix the factorization A = UDU H above is both an SVD and an SVF provided we have sorted the eigenvalues in nondecreasing order.

Geometric Interpretation y2 2 1 x2 2 u 1 3 2 1 1 2 3 y1 1 2 u 2 Σ 1 Σ 2 2 1 1 2 x1 1 2 Left: Unit circle S, right: the image AS. A := 1 [ 11 48 25 48 39 ] R2,2., U = 1 [ 3 4 5 4 3 ], AS = {x : Σ 1 U T x 2 2 = 1} = {(x 1, x 2 ) : ( 3 5 x 1+ 4 5 x 2) 2 + ( 4 5 x 1+ 3 5 x 2) 2 = 1} 9 1 AS is an ellipsoid The singular values give the length of the semi-axes. The semi-axes are along the left singular vectors.

Singular vectors The columns u 1,..., u m of U are called left singular vectors and the columns v 1,..., v n of V are called right singular vectors 1. AV 1 = U 1 Σ 1, or Av i = σ i u i for i = 1,..., r, 2. AV 2 = 0, or Av i = 0 for i = r + 1,..., n, 3. A H U 1 = V 1 Σ 1, or A H u i = σ i v i for i = 1,..., r, 4. A H U 2 = 0, or A H u i = 0 for i = r + 1,..., m. 1. U 1 is an orthonormal basis for span(a), 2. V 2 is an orthonormal basis for ker(a), 3. V 1 is an orthonormal basis for span(a H ), 4. U 2 is an orthonormal basis for ker(a H ). (4) (5)

SVD of A H A and AA H A = UΣV H = U 1 Σ 1 V H 1, (SVD and SVF) A H A = VΣ H ΣV H = V 1 Σ 2 1V H 1, (SVD and SVF) A H AV 1 = V 1 Σ 2 1, A H AV 2 = V 1 Σ 2 1V H 1 V 2 = 0, AA H = UΣΣ H U H = U 1 Σ 2 1U H 1, (SVD and SVF) AA H U 1 = U 1 Σ 2 1, AA H U 2 = 0,

Rank and nullity relations Corollary Suppose A C m,n. Then rank(a) + null(a) = n, rank(a) + null(a H ) = m, rank(a) = rank(a H ). Theorem For any A C m,n we have rank A = rank(a H A) = rank(aa H ), null(a H A) = null A,and null(aa H ) = null(a H ), span(a H A) = span A H and ker(a H A) = ker A.

The Pseudoinverse The pseudo-inverse of A C m,n is the matrix A C n,m given by A := V 1 Σ 1 1 UH 1, where A = U 1 Σ 1 V H 1 A. is the singular value factorization of A is independent of the factorization chosen to represent it. If A is square and nonsingular then A A = AA = I and A is the usual inverse of A. Any matrix has a pseudo-inverse, and so A is a generalization of the usual inverse.

How to find the pseudoinverse 1. Find the SVF of A 2. If A C m,n has rank n then A = (A H A) 1 A H. 3. If B C n,m satisfies ABA = A, BAB = B, (BA) H = BA, (AB) H = AB, then B = A 4. Use MATLAB: B = pinv(a)

Example 14 2 1 2 2 2 0 [ ] A = 1 4 22 = 1 2 2 1 0 1 3 4 1 15 3 5 4 3 16 13 2 1 2 0 0 [ ] [ ] 1 2 2 3 4 1/2 0 0 A = 1 1 2 2 1 5 4 3 0 1 0 3 2 1 2 [ ] 19 10 14 A = B = 1 30 8 20 2 ABA = A, BAB = B, (BA) H = BA, (AB) H = AB, [ ] 52 36 A T A = 1,, 25 36 73 ] A = (A T A) 1 A T = 1 100 [ 73 36 36 52 1 15 [ 14 4 ] 16 2 22 13

Theory; Direct sum and Orthogonal Sum Suppose S and T are subspaces of a vector space (V, F). We define Sum: X := S + T := {s + t : s S and t T }; Direct Sum: If S T = {0}, then S T := S + T. Orthogonal Sum: Suppose (V, F,, ) is an inner product space. Then S T is an orthogonal sum if s, t = 0 for all s S and all t T. span(a) ker(a H ) is an orthogonal sum with respect to the usual inner product s, t := s H t. For if y = Ax span(a) and z ker(a H ) then y H z = (Ax) H z = x H (A H z) = 0. orthogonal complement: T = S := {x X : s, x = 0 for all s S}.

Basic facts Suppose S and T are subspaces of a vector space (V, F). Then S + T = T + S and S + T is a subspace of V. dim(s + T ) = dim S + dim T dim(s T ) dim(s T ) = dim S + dim T. C m = span(a) ker(a H ) Every v S T can be decomposed uniquely as v = s + t, where s S and t T. If S T is an orthogonal sum then s is called the orthogonal projection of v into S. v t S s

Orthogonal Projections The singular value decomposition and the pseudo-inverse can be used to compute orthogonal projections into the subspaces span(a) and ker(a H ). Recall that if A = UΣV H is the SVD of A and U = [U 1, U 2 ] as before then U 1 is an orthonormal basis for span(a) and U 2 is an orthonormal basis for ker(a H ). Let b C m. Then b = UU H b = [U 1 U 2 ] [ ] U H 1 U H b = U 1 U H 1 b+u 2 U H 2 b =: b 1 +b 2. 2 b 1 := U 1 (U H 1 b) span(a) is the orthogonal projection into span(a). b 1 = AA b b 2 = U 2 (U H 2 b) ker(a H ) is the orthogonal projection into ker(a H ). b 2 = (I AA )b

Example 1 0 The singular value decomposition of A = 0 1 is 0 0 A = I 3 AI 2. [ ] [ ] 1 0 0 1 0 0 A = I 2 I 0 1 0 3 =. 0 1 0 1 0 [ ] 1 0 0 AA = 0 1 1 0 0 = 0 1 0 0 1 0 0 0 0 0 0 0 0 0 I 3 AA = 0 0 0 0 0 1 If b = [b 1, b 2, b 3 ] T, then b 1 = AA b = [b 1, b 2, 0] and b 2 = (I 3 AA )b = [0, 0, b 3 ] T.

Minmax and Maxmin Theorems R(x) = R A (x) := xh Ax x H x Theorem (The Courant-Fischer Theorem) Suppose A C n,n is Hermitian with eigenvalues λ 1, λ 2,..., λ n ordered so that λ 1 λ n. Then for k = 1,..., n λ k = min max R(x) = dim(s)=n k+1 x S x 0 max min R(x). (6) dim(s)=k x S x 0

Minmax and Maxmin Theorems for singular values Theorem (The Courant-Fischer Theorem for Singular Values) Suppose A C m,n has singular values σ 1, σ 2,..., σ n ordered so that σ 1 σ n. Then for k = 1,..., n Proof σ k = min max Ax 2 = max dim(s)=n k+1 x S x 2 dim(s)=k x 0 min x S x 0 Ax 2 x 2. Ax 2 2 x 2 2 = Ax, Ax x, x = x, AH Ax x, x = R A H A (x).

The largest and smallest singular value Ax 2 σ 1 = max x C n x 2 x 0 Ax 2 σ n = min x C n x 2 x 0 = max Ax 2, x C n x 2 =1 = min Ax 2. x C n x 2 =1

Hoffman-Wielandt Theorem Theorem (Eigenvalues) Suppose A, B C n,n are both Hermitian matrices with eigenvalues λ 1 λ n and µ 1 µ n, respectively. Then n µ j λ j 2 A B 2 F := j=1 n i=1 n a ij b ij 2. j=1

Hoffman-Wielandt Theorem Theorem (Singular values) For any m, n N and A, B C m,n we have n β j α j 2 A B 2 F, j=1 where α 1 α n and β 1 β n are the singular values of A and B, respectively.

Proof1 C := [ 0 ] A A H 0 and D := C H = C and D H = D. [ ] 0 B B H C 0 m+n,m+n. If C and D has eigenvalues λ 1 λ m+n and µ 1 µ m+n, respectively then m+n λ j µ j 2 C D 2 F. j=1 Suppose A has rank r and SVD UΣV H. Av i = α i u i, A H u i = α i v i for i = 1,..., r A H u i = 0 for i = r + 1,..., m, Av i = 0 for i = r + 1,..., n

Proof2 [ ] [ ] [ ] [ ] [ ] 0 A ui Avi αi u A H = 0 v i A H = i ui = α u i α i v i, i = 1,..., r, i v i [ ] [ ] [ ] [ ] [ ] 0 A ui Avi αi u A H = 0 v i A H = i ui = α u i α i v i, i = 1,..., r, i v i [ ] [ ] [ ] [ ] [ ] 0 A ui 0 0 ui A H = 0 0 A H = = 0, i = r + 1,..., m, u i 0 0 [ ] [ ] [ ] [ ] [ ] 0 A 0 Avi 0 0 A H = = = 0, i = r + 1,..., n. 0 0 0 v i v i

Proof3 Thus C has the 2r eigenvalues α 1, α 1,..., α r, α r and m + n 2r additional zero eigenvalues. Similarly, if B has rank s then D has the 2s eigenvalues β 1, β 1,..., β s, β s and m + n 2s additional zero eigenvalues. t := max(r, s). λ 1 λ m+n = α 1 α t 0 = = 0 α t α 1, µ 1 µ m+n = β 1 β t 0 = = 0 β t β 1.

Proof4 m+n λ j µ j 2 = j=1 t t t α i β i 2 + α i +β i 2 = 2 α i β i 2 i=1 i=1 i=1 [ ] C D 2 0 A B F = A H B H 0 = B A 2 F + (B A) H 2 F = 2 B A 2 F. 2 F 1 2 m+n j=1 λ j µ j 2 = t i=1 α i β i 2 1 2 C D 2 F = B A 2 F. Since t n and α i = β i = 0 for i = t + 1,..., n we obtain the result.