The Singular Value Decomposition and Least Squares Problems

Similar documents
Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

The Singular Value Decomposition

Singular Value Decomposition

Orthonormal Transformations and Least Squares

18.06 Problem Set 10 - Solutions Due Thursday, 29 November 2007 at 4 pm in

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Maths for Signals and Systems Linear Algebra in Engineering

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

The Singular Value Decomposition

Singular Value Decomposition (SVD)

MATH36001 Generalized Inverses and the SVD 2015

UNIT 6: The singular value decomposition.

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

Linear Least Squares. Using SVD Decomposition.

Math 408 Advanced Linear Algebra

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

Lecture notes: Applied linear algebra Part 1. Version 2

Chapter 7: Symmetric Matrices and Quadratic Forms

Singular Value Decomposition

SINGULAR VALUE DECOMPOSITION

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

Numerical Methods I Singular Value Decomposition

Linear Algebra, part 3 QR and SVD

2. LINEAR ALGEBRA. 1. Definitions. 2. Linear least squares problem. 3. QR factorization. 4. Singular value decomposition (SVD) 5.

Orthonormal Transformations

MATH 612 Computational methods for equation solving and function minimization Week # 2

Vector and Matrix Norms. Vector and Matrix Norms

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

The Singular Value Decomposition

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

Orthogonal Transformations

EE731 Lecture Notes: Matrix Computations for Signal Processing

Stat 159/259: Linear Algebra Notes

1. The Polar Decomposition

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Notes on Eigenvalues, Singular Values and QR

Summary of Week 9 B = then A A =

Linear Algebra. Session 12

Chapter 0 Miscellaneous Preliminaries

7. Symmetric Matrices and Quadratic Forms

Singular Value Decomposition (SVD) and Polar Form

Pseudoinverse & Orthogonal Projection Operators

Math 407: Linear Optimization

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Lecture 2 INF-MAT : , LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky

. = V c = V [x]v (5.1) c 1. c k

Linear Algebra Formulas. Ben Lee

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

Linear Algebra in Actuarial Science: Slides to the lecture

Problem Set (T) If A is an m n matrix, B is an n p matrix and D is a p s matrix, then show

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

Review of some mathematical tools

Lecture 7: Positive Semidefinite Matrices

Lecture notes on Quantum Computing. Chapter 1 Mathematical Background

NORMS ON SPACE OF MATRICES

2. Every linear system with the same number of equations as unknowns has a unique solution.

Review problems for MA 54, Fall 2004.

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Singular Value Decomposition

EE263: Introduction to Linear Dynamical Systems Review Session 9

14 Singular Value Decomposition

Introduction to Numerical Linear Algebra II

The QR Decomposition

Lecture 4 Orthonormal vectors and QR factorization

Computational math: Assignment 1

Matrix decompositions

1 Last time: least-squares problems

Chapter 6: Orthogonality

ELE/MCE 503 Linear Algebra Facts Fall 2018

Designing Information Devices and Systems II

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

Elementary linear algebra

Recall the convention that, for us, all vectors are column vectors.

Pseudoinverse & Moore-Penrose Conditions

Computational Methods CMSC/AMSC/MAPL 460. EigenValue decomposition Singular Value Decomposition. Ramani Duraiswami, Dept. of Computer Science

I. Multiple Choice Questions (Answer any eight)

MATH 423 Linear Algebra II Lecture 33: Diagonalization of normal operators.

A Review of Linear Algebra

Control Systems. Linear Algebra topics. L. Lanari

Cheat Sheet for MATH461

Computational Methods. Eigenvalues and Singular Values

Linear Algebra Fundamentals

Lecture: Face Recognition and Feature Reduction

Notes on Linear Algebra

COMP 558 lecture 18 Nov. 15, 2010

AMS526: Numerical Analysis I (Numerical Linear Algebra)

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Parallel Singular Value Decomposition. Jiaxing Tan

Foundations of Matrix Analysis

AMS526: Numerical Analysis I (Numerical Linear Algebra for Computational and Data Sciences)

Lecture 3: Review of Linear Algebra

MATH 350: Introduction to Computational Mathematics

YORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #2 Solutions

Lecture 3: Review of Linear Algebra

Linear Algebra Review. Vectors

Transcription:

The Singular Value Decomposition and Least Squares Problems Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo September 27, 2009

Applications of SVD solving over-determined equations statistics, principal component analysis numerical determination of the rank of a matrix search engines (Google,...) theory of matrices and lots of other applications...

Diagonalization A square matrix A can be diagonalized by a unitary similarity transformation if and only if it is normal. U H AU = D := diag(λ 1,..., λ n ) or A = UDU H. If U = [u 1,..., u n ] then Au j = λ j u j, u H j u k = δ jk. If A is real and symmetric then U and D are real. Today: Any matrix, even a rectangular one, can be diagonalized provided we allow two different unitary matrices. A = UΣV H is called a Singular Value Decomposition (SVD) if Σ is a diagonal matrix of the same dimension as A, and U and V are square and unitary. The diagonal entries of Σ are called the singular values of the matrix.

Hermitian matrix Recall Theorem Suppose A C n,n is Hermitian. Then A has real eigenvalues λ 1,..., λ n. Moreover, there is a unitary matrix U C n,n such that U H AU = diag(λ 1,..., λ n ). For the columns {u 1,..., u n } of U we have Au j = λ j u j for j = 1,..., n. Thus {u 1,..., u n } are orthonormal eigenvectors of A.

Eigenvalues of A H A Lemma Suppose m, n N and A C m,n. The matrix A H A has eigenpairs (λ j, v j ) for j = 1,..., n, where v H j v k = δ jk and λ 1 λ 2 λ n 0. Moreover σ j := λ j = Av j 2, for j = 1,..., n. (1) Proof: A H A C n,n is Hermitian. It has real eigenvalues λ j and orthonormal eigenvectors v j. Av j 2 2 = (Av j ) H Av j = v H j A H Av j = v H j λ j v j = λ j 0.

Definition (Singular Values) The positive square roots σ j := λ j, j = 1,..., n of the eigenvalues of A H A are called the singular values of A C m,n. σ 1 σ 2 σ r > 0 = σ r+1 = = σ n. [ ] Σ1 0 Σ := r,n r R 0 m r,r 0 m,n, m r,n r Σ 1 := diag(σ 1,..., σ r ), 0 k,l = [ ], the empty matrix, if k = 0 or l = 0. Σ = [σ 1 e 1,..., σ r e r, 0,..., 0]. We will show that r is the rank of A.

Examples [ ] [ ] 11 48 3 0 A := 1, Σ = = Σ 25 48 39 0 1 1. 14 2 2 0 [ ] [ ] A := 1 4 22, Σ = 0 1 Σ1 2 0 =, Σ 15 0 1 =. 16 13 0 0 1,2 0 1 [ ] [ ] [ ] 14 4 16 2 0 0 2 0 A 1 := 1, Σ =, Σ 15 2 22 13 0 1 0 1 =. 0 1 1 1 [ ] A = 1 1 Σ1 0 Σ =, Σ 0 0 1 = [2]. 0 0

The Singular Value Decomposition Theorem (Existence of SVD) Let m, n N and suppose A C m,n has r nonzero singular values σ 1 σ r > 0 = σ r+1 = = σ n. Then A has the singular value decomposition A = UΣV H, where U C m,m and V C n,n are unitary, and [ Σ1 0 Σ = r,n r 0 m r,r 0 m r,n r ], Σ 1 = diag(σ 1,..., σ r ). (2) If A is real then A = UΣV T, where U R m,m and V R n,n are orthonormal, and Σ is given by (2).

A useful result From the eigenvectors of A H A we can derive orthonormal bases for the column space span(a) and null space ker(a) of A. Theorem Suppose A C m,n and let (σ 2 j, v j) for j = 1,..., n be orthonormal eigenpairs for A H A. Suppose r of the singular values are nonzero so that σ 1 σ r > 0 = σ r+1 = = σ n. (3) Then {Av 1,..., Av r } is an orthogonal basis for the column space of A and {v r+1,..., v n } is an orthonormal basis for the nullspace of A.

Part of Proof For j k (Av j ) H Av k = v H j A H Av k = v H j λ k v k = 0, and {Av 1,..., Av n } is an orthogonal set. σ j = Av j 2 (Av j 0 j {1,..., r}) But then span([av 1,..., Av r ]) span(a) and span([v r+1,..., v n ]) ker(a). For equality we need to show the opposite inclusions.

Outline of existence proof V := [v 1,..., v n ] C n,n where A H Av j = λ j v j and {v 1,..., v n } is orthonormal. σ j := λ j for j = 1,..., n. u j := Av j Av j 2 = 1 σ j Av j, for j = 1,..., r. U := [u 1,..., u m ] C m,m, where {u r+1,..., u m } is defined by extending {u 1,..., u r } to an orthonormal basis for C m. UΣ = U[σ 1 e 1,..., σ r e r, 0,..., 0] = [σ 1 u 1,..., σ r u r, 0,..., 0] = [Av 1,..., Av n ] = AV. Since V is unitary we find UΣV H = AVV H = A.

Uniqueness The singular values are unique The matrices U and V are in general not unique.

Examples [ ] [ ] [ ] [ ] 1 11 48 3 4 3 0 = 25 48 39 1 1 3 4. 5 4 3 0 1 5 4 3 [ ] [ ] [ ] 1 2 2 1 14 4 16 3 4 2 0 0 = 15 2 22 13 1 1 2 2 1 5 4 3 0 1 0 3 2 1 2 14 2 1 2 2 2 0 [ ] 1 4 22 = 1 2 2 1 0 1 3 4 1 15 3 5 4 3 16 13 2 1 2 0 0 1 1 1/ 2 1/ 2 0 1 1 = 1/ 2 1/ 2 0 [ ] 2 0 0 0 1 1 1 2 1 1 0 0 0 0 1 0 0

r < n < m Find the singular value decomposition of 1 1 A = 1 1. 0 0 [ ] 2 2 B := A T A = 2 2 [ ] [ ] 1 1 B = 4, B 1 1 [ ] 1 = 0 1 [ ] 1 1 u 1 = Av 1 /σ 1 = s 1 / 2, where s 1 = [1, 1, 0] T. Extend s 1 to a basis {s 1, s 2, s 3 } for R 3 Apply Gram-Schmidt to {s 1, s 2, s 3 }

Comments on computing the SVD The method we used to find the singular value decomposition in the previous example can be suitable for hand calculation with small matrices, but it is not appropriate as a general purpose numerical method. In particular, the Gram-Schmidt orthogonalization process which can be used to extend u 1,... u r to an orthonormal basis is not numerically stable. Forming A H A can lead to extra errors in the computation. State of the art computer implementations of the singular value decomposition uses an adapted version of the QR-algorithm where the matrix A H A is not formed. The QR-algorithm is discussed in Chapter 20.

SVD using Matlab [U,S,V]=svd(A), the singular value decomposition s=svd(a) the singular values [U,S,V]=svd(A,0), economy size, If if m > n then U C m,n, S R n,n, and V C n,n as before.

The Singular Value Factorization (SVF) SVD: A = UΣV H, U, V square. U = [u 1,..., u m ] = [U 1, U 2 ], U 1 R m,r, U 2 R m,m r, V = [v 1,..., v n ] = [V 1, V 2 ], V 1 R n,r, V 2 R n,n r, A = [ ] [ ] [ ] Σ U 1, U 1 0 V H 1 2 0 0 V H = U 1 Σ 1 V H 1 2 singular value factorization.

Three forms of SVD m n m n n r r n V H Σ H 1 V 1 A = m U Σ = U 1 = σ 1 u 1 v 1 H +...+σr u r v r H A = UΣV H, SVD A = U 1 Σ 1 V H 1, SVF r = σ i u i v H i, outer product form. i=1

Normal and positive semidefinite matrices σ j = λ j if A H A = AA H. σ j = λ j if A is symmetric positive semidefinite. Spectral decomposition A = UDU H, D = diag(λ 1,..., λ n ) and U H U = I. A H A = UD H DU H, D H D = diag( λ 1 2,..., λ n 2 ) For a positive semi-definite matrix the factorization A = UDU H above is both an SVD and an SVF provided we have sorted the eigenvalues in nondecreasing order.

Geometric Interpretation y2 2 1 x2 2 u 1 3 2 1 1 2 3 y1 1 2 u 2 Σ 1 Σ 2 2 1 1 2 x1 1 2 Left: Unit circle S, right: the image AS. A := 1 [ 11 48 25 48 39 ] R2,2., U = 1 [ 3 4 5 4 3 ], AS = {x : Σ 1 U T x 2 2 = 1} = {(x 1, x 2 ) : ( 3 5 x 1+ 4 5 x 2) 2 + ( 4 5 x 1+ 3 5 x 2) 2 = 1} 9 1 AS is an ellipsoid The singular values give the length of the semi-axes. The semi-axes are along the left singular vectors.

Singular vectors The columns u 1,..., u m of U are called left singular vectors and the columns v 1,..., v n of V are called right singular vectors 1. AV 1 = U 1 Σ 1, or Av i = σ i u i for i = 1,..., r, 2. AV 2 = 0, or Av i = 0 for i = r + 1,..., n, 3. A H U 1 = V 1 Σ 1, or A H u i = σ i v i for i = 1,..., r, 4. A H U 2 = 0, or A H u i = 0 for i = r + 1,..., m. 1. U 1 is an orthonormal basis for span(a), 2. V 2 is an orthonormal basis for ker(a), 3. V 1 is an orthonormal basis for span(a H ), 4. U 2 is an orthonormal basis for ker(a H ). (4) (5)

SVD of A H A and AA H A = UΣV H = U 1 Σ 1 V H 1, (SVD and SVF) A H A = VΣ H ΣV H = V 1 Σ 2 1V H 1, (SVD and SVF) A H AV 1 = V 1 Σ 2 1, A H AV 2 = V 1 Σ 2 1V H 1 V 2 = 0, AA H = UΣΣ H U H = U 1 Σ 2 1U H 1, (SVD and SVF) AA H U 1 = U 1 Σ 2 1, AA H U 2 = 0,

Rank and nullity relations Corollary Suppose A C m,n. Then rank(a) + null(a) = n, rank(a) + null(a H ) = m, rank(a) = rank(a H ). Theorem For any A C m,n we have rank A = rank(a H A) = rank(aa H ), null(a H A) = null A,and null(aa H ) = null(a H ), span(a H A) = span A H and ker(a H A) = ker A.

The Pseudoinverse The pseudo-inverse of A C m,n is the matrix A C n,m given by A := V 1 Σ 1 1 UH 1, where A = U 1 Σ 1 V H 1 A. is the singular value factorization of A is independent of the factorization chosen to represent it. If A is square and nonsingular then A A = AA = I and A is the usual inverse of A. Any matrix has a pseudo-inverse, and so A is a generalization of the usual inverse.

How to find the pseudoinverse 1. Find the SVF of A 2. If A C m,n has rank n then A = (A H A) 1 A H. 3. If B C n,m satisfies ABA = A, BAB = B, (BA) H = BA, (AB) H = AB, then B = A 4. Use MATLAB: B = pinv(a)

Example 14 2 1 2 2 2 0 [ ] A = 1 4 22 = 1 2 2 1 0 1 3 4 1 15 3 5 4 3 16 13 2 1 2 0 0 [ ] [ ] 1 2 2 3 4 1/2 0 0 A = 1 1 2 2 1 5 4 3 0 1 0 3 2 1 2 [ ] 19 10 14 A = B = 1 30 8 20 2 ABA = A, BAB = B, (BA) H = BA, (AB) H = AB, [ ] 52 36 A T A = 1,, 25 36 73 ] A = (A T A) 1 A T = 1 100 [ 73 36 36 52 1 15 [ 14 4 ] 16 2 22 13

Theory; Direct sum and Orthogonal Sum Suppose S and T are subspaces of a vector space (V, F). We define Sum: X := S + T := {s + t : s S and t T }; Direct Sum: If S T = {0}, then S T := S + T. Orthogonal Sum: Suppose (V, F,, ) is an inner product space. Then S T is an orthogonal sum if s, t = 0 for all s S and all t T. span(a) ker(a H ) is an orthogonal sum with respect to the usual inner product s, t := s H t. For if y = Ax span(a) and z ker(a H ) then y H z = (Ax) H z = x H (A H z) = 0. orthogonal complement: T = S := {x X : s, x = 0 for all s S}.

Basic facts Suppose S and T are subspaces of a vector space (V, F). Then S + T = T + S and S + T is a subspace of V. dim(s + T ) = dim S + dim T dim(s T ) dim(s T ) = dim S + dim T. C m = span(a) ker(a H ) Every v S T can be decomposed uniquely as v = s + t, where s S and t T. If S T is an orthogonal sum then s is called the orthogonal projection of v into S. v t S s

Orthogonal Projections The singular value decomposition and the pseudo-inverse can be used to compute orthogonal projections into the subspaces span(a) and ker(a H ). Recall that if A = UΣV H is the SVD of A and U = [U 1, U 2 ] as before then U 1 is an orthonormal basis for span(a) and U 2 is an orthonormal basis for ker(a H ). Let b C m. Then b = UU H b = [U 1 U 2 ] [ ] U H 1 U H b = U 1 U H 1 b+u 2 U H 2 b =: b 1 +b 2. 2 b 1 := U 1 (U H 1 b) span(a) is the orthogonal projection into span(a). b 1 = AA b b 2 = U 2 (U H 2 b) ker(a H ) is the orthogonal projection into ker(a H ). b 2 = (I AA )b

Example 1 0 The singular value decomposition of A = 0 1 is 0 0 A = I 3 AI 2. [ ] [ ] 1 0 0 1 0 0 A = I 2 I 0 1 0 3 =. 0 1 0 1 0 [ ] 1 0 0 AA = 0 1 1 0 0 = 0 1 0 0 1 0 0 0 0 0 0 0 0 0 I 3 AA = 0 0 0 0 0 1 If b = [b 1, b 2, b 3 ] T, then b 1 = AA b = [b 1, b 2, 0] and b 2 = (I 3 AA )b = [0, 0, b 3 ] T.

Minmax and Maxmin Theorems R(x) = R A (x) := xh Ax x H x Theorem (The Courant-Fischer Theorem) Suppose A C n,n is Hermitian with eigenvalues λ 1, λ 2,..., λ n ordered so that λ 1 λ n. Then for k = 1,..., n λ k = min max R(x) = dim(s)=n k+1 x S x 0 max min R(x). (6) dim(s)=k x S x 0

Minmax and Maxmin Theorems for singular values Theorem (The Courant-Fischer Theorem for Singular Values) Suppose A C m,n has singular values σ 1, σ 2,..., σ n ordered so that σ 1 σ n. Then for k = 1,..., n Proof σ k = min max Ax 2 = max dim(s)=n k+1 x S x 2 dim(s)=k x 0 min x S x 0 Ax 2 x 2. Ax 2 2 x 2 2 = Ax, Ax x, x = x, AH Ax x, x = R A H A (x).

The largest and smallest singular value Ax 2 σ 1 = max x C n x 2 x 0 Ax 2 σ n = min x C n x 2 x 0 = max Ax 2, x C n x 2 =1 = min Ax 2. x C n x 2 =1

Hoffman-Wielandt Theorem Theorem (Eigenvalues) Suppose A, B C n,n are both Hermitian matrices with eigenvalues λ 1 λ n and µ 1 µ n, respectively. Then n µ j λ j 2 A B 2 F := j=1 n i=1 n a ij b ij 2. j=1

Hoffman-Wielandt Theorem Theorem (Singular values) For any m, n N and A, B C m,n we have n β j α j 2 A B 2 F, j=1 where α 1 α n and β 1 β n are the singular values of A and B, respectively.

Proof1 C := [ 0 ] A A H 0 and D := C H = C and D H = D. [ ] 0 B B H C 0 m+n,m+n. If C and D has eigenvalues λ 1 λ m+n and µ 1 µ m+n, respectively then m+n λ j µ j 2 C D 2 F. j=1 Suppose A has rank r and SVD UΣV H. Av i = α i u i, A H u i = α i v i for i = 1,..., r A H u i = 0 for i = r + 1,..., m, Av i = 0 for i = r + 1,..., n

Proof2 [ ] [ ] [ ] [ ] [ ] 0 A ui Avi αi u A H = 0 v i A H = i ui = α u i α i v i, i = 1,..., r, i v i [ ] [ ] [ ] [ ] [ ] 0 A ui Avi αi u A H = 0 v i A H = i ui = α u i α i v i, i = 1,..., r, i v i [ ] [ ] [ ] [ ] [ ] 0 A ui 0 0 ui A H = 0 0 A H = = 0, i = r + 1,..., m, u i 0 0 [ ] [ ] [ ] [ ] [ ] 0 A 0 Avi 0 0 A H = = = 0, i = r + 1,..., n. 0 0 0 v i v i

Proof3 Thus C has the 2r eigenvalues α 1, α 1,..., α r, α r and m + n 2r additional zero eigenvalues. Similarly, if B has rank s then D has the 2s eigenvalues β 1, β 1,..., β s, β s and m + n 2s additional zero eigenvalues. t := max(r, s). λ 1 λ m+n = α 1 α t 0 = = 0 α t α 1, µ 1 µ m+n = β 1 β t 0 = = 0 β t β 1.

Proof4 m+n λ j µ j 2 = j=1 t t t α i β i 2 + α i +β i 2 = 2 α i β i 2 i=1 i=1 i=1 [ ] C D 2 0 A B F = A H B H 0 = B A 2 F + (B A) H 2 F = 2 B A 2 F. 2 F 1 2 m+n j=1 λ j µ j 2 = t i=1 α i β i 2 1 2 C D 2 F = B A 2 F. Since t n and α i = β i = 0 for i = t + 1,..., n we obtain the result.