STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2017 LECTURE 5

Similar documents
Notes on Eigenvalues, Singular Values and QR

Notes on singular value decomposition for Math 54. Recall that if A is a symmetric n n matrix, then A has real eigenvalues A = P DP 1 A = P DP T.

MATH36001 Generalized Inverses and the SVD 2015

The Singular Value Decomposition

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

Maths for Signals and Systems Linear Algebra in Engineering

Properties of Matrices and Operations on Matrices

linearly indepedent eigenvectors as the multiplicity of the root, but in general there may be no more than one. For further discussion, assume matrice

The Singular Value Decomposition

Section 3.9. Matrix Norm

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2013 PROBLEM SET 2

LinGloss. A glossary of linear algebra

The Singular Value Decomposition and Least Squares Problems

UNIT 6: The singular value decomposition.

18.06 Problem Set 10 - Solutions Due Thursday, 29 November 2007 at 4 pm in

Introduction to Numerical Linear Algebra II

Review problems for MA 54, Fall 2004.

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Computational math: Assignment 1

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Lecture notes: Applied linear algebra Part 1. Version 2

Linear Least Squares. Using SVD Decomposition.

1 Singular Value Decomposition and Principal Component

EE/ACM Applications of Convex Optimization in Signal Processing and Communications Lecture 2

MATH 612 Computational methods for equation solving and function minimization Week # 2

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Singular Value Decomposition

Lecture 5. Ch. 5, Norms for vectors and matrices. Norms for vectors and matrices Why?

Fall TMA4145 Linear Methods. Exercise set Given the matrix 1 2

Dimensionality Reduction: PCA. Nicholas Ruozzi University of Texas at Dallas

EE731 Lecture Notes: Matrix Computations for Signal Processing

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Linear Algebra Massoud Malek

Singular Value Decomposition and Polar Form

The University of Texas at Austin Department of Electrical and Computer Engineering. EE381V: Large Scale Learning Spring 2013.

Pseudoinverse & Orthogonal Projection Operators

σ 11 σ 22 σ pp 0 with p = min(n, m) The σ ii s are the singular values. Notation change σ ii A 1 σ 2

Chapter 7. Canonical Forms. 7.1 Eigenvalues and Eigenvectors

Numerical Linear Algebra Homework Assignment - Week 2

THE SINGULAR VALUE DECOMPOSITION MARKUS GRASMAIR

Singular Value Decomposition (SVD) and Polar Form

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 13

ELA THE OPTIMAL PERTURBATION BOUNDS FOR THE WEIGHTED MOORE-PENROSE INVERSE. 1. Introduction. Let C m n be the set of complex m n matrices and C m n

Review of Linear Algebra

Functional Analysis Review

Linear Algebra. Session 12

MATH 581D FINAL EXAM Autumn December 12, 2016

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

Linear Algebra Review. Vectors

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

NORMS ON SPACE OF MATRICES

The Singular Value Decomposition (SVD) and Principal Component Analysis (PCA)

2. Review of Linear Algebra

Large Scale Data Analysis Using Deep Learning

ECE 275A Homework # 3 Due Thursday 10/27/2016

ECE 275A Homework #3 Solutions

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Singular Value Decomposition

Homework 1. Yuan Yao. September 18, 2011

Lecture 2: Linear Algebra Review

Chapter 1. Matrix Algebra

Summary of Week 9 B = then A A =

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Review of Some Concepts from Linear Algebra: Part 2

Chapter 3 Transformations

Review of some mathematical tools

Lecture 1 Systems of Linear Equations and Matrices

Stat 159/259: Linear Algebra Notes

7. Symmetric Matrices and Quadratic Forms

Numerical Methods I Singular Value Decomposition

MASSACHUSETTS INSTITUTE OF TECHNOLOGY Department of Electrical Engineering and Computer Science : Dynamic Systems Spring 2011

Matrices and Vectors. Definition of Matrix. An MxN matrix A is a two-dimensional array of numbers A =

MATRICES ARE SIMILAR TO TRIANGULAR MATRICES

Principal Component Analysis

Moore-Penrose Conditions & SVD

Matrices A brief introduction

. = V c = V [x]v (5.1) c 1. c k

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

Review of similarity transformation and Singular Value Decomposition

Singular Value Decompsition

Lecture 8: Linear Algebra Background

Lecture 5 Singular value decomposition

Chapter 0 Miscellaneous Preliminaries

CHARACTERIZATIONS. is pd/psd. Possible for all pd/psd matrices! Generating a pd/psd matrix: Choose any B Mn, then

Singular Value Decomposition (SVD)

1 Linearity and Linear Systems

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Chap 3. Linear Algebra

The QR Decomposition

1. General Vector Spaces

Spectrum and Pseudospectrum of a Tensor

Lecture 1: Review of linear algebra

MATH 425-Spring 2010 HOMEWORK ASSIGNMENTS

Cheat Sheet for MATH461

5 Selected Topics in Numerical Linear Algebra

Theorem A.1. If A is any nonzero m x n matrix, then A is equivalent to a partitioned matrix of the form. k k n-k. m-k k m-k n-k

Linear Algebra. Shan-Hung Wu. Department of Computer Science, National Tsing Hua University, Taiwan. Large-Scale ML, Fall 2016

Proposition 42. Let M be an m n matrix. Then (32) N (M M)=N (M) (33) R(MM )=R(M)

Foundations of Matrix Analysis

Transcription:

STAT 39: MATHEMATICAL COMPUTATIONS I FALL 17 LECTURE 5 1 existence of svd Theorem 1 (Existence of SVD) Every matrix has a singular value decomposition (condensed version) Proof Let A C m n and for simplicity we assume that all its nonzero singular values are distinct We define the matrix [ ] A W = A C (m+n) (m+n) It is easy to verify that W = W (after Wielandt, who s the first to consider this matrix) and by the spectral theorem for Hermitian matrices, W has an evd, W = ZΛZ where Z C (m+n) (m+n) is a unitary matrix and Λ R (m+n) (m+n) is a diagonal matrix with real diagonal elements If z is an eigenvector of W, then we can write [ x W z = σz, z = y] and therefore [ [ ] [ A x x A ] = σ y y] or, equivalently, Ay = σx, A x = σy Now, suppose that we apply W to the vector obtained from z by negating y Then we have [ ] [ ] [ ] [ ] [ ] A x Ay σx x A = y A = = σ x σy y In other words, if σ is an eigenvalue, then σ is also an eigenvalue So we may assume without loss of generality that σ 1 σ σ r > = = where r = rank(a) So the diagonal matrix Λ of eigenvalues of W may be written as Λ = diag(σ 1, σ,, σ r, σ 1, σ,, σ r,,, ) C (m+n) (m+n) Observe that there is a zero block of size (m + n r) (m + n r) in the bottom right corner of Λ We scale the eigenvector z of W so that z z = Since W is symmetric, eigenvectors corresponding to the distinct eigenvalues σ and σ are orthogonal, so it follows that [ x y ] [ x y ] = Date: October 1, 18, version 1 Comments, bug reports: lekheng@galtonuchicagoedu 1

These yield the system of equations which has the unique solution x x + y y =, x x y y =, x x = 1, y y = 1 Now note that we can represent the matrix of normalized eigenvectors of W corresponding to nonzero eigenvalues (note that there are exactly r of these) as Z = 1 [ ] X X C Y Y (m+n) r Note that the factor 1/ appears because of the way we have chosen the norm of z We also let It is easy to see that Λ = diag(σ 1, σ,, σ r, σ 1, σ,, σ r ) C r r ZΛZ = Z Λ Z just by multiplying out the zero block in Λ So we have [ ] A A = W = ZΛZ = ZΛ Z = 1 [ X X Y Y = 1 [ XΣr XΣ r = 1 [ XΣ r Y ] Y Σ r X [ XΣ = r Y ] Y Σ r X and therefore ] [ ] [ Σr X Y ] Σ r X Y Y Σ r Y Σ r ] [ X Y X Y ] A = XΣ r Y, A = Y Σ r X where X is an m r matrix, Σ is r r, and Y is n r, and r is the rank of A We have obtained the condensed svd of A The last missing bit is the orthonormality of the columns of X and Y This follows from the fact that distinct columns of [ X ] X Y Y are mutually orthogonal since they correspond to distinct eigenvalues and so if we pick [ x i y i ], [ x j ] y j, [ xj y j ] for i j, and take their inner products, we get x i x j + y i y j =, x i x j y i y j = Adding them gives x i x j = and substracting them gives y i y j = for all i j, as required see Theorem 41 in Trefethen and Bau for an alternative non-constructive proof that does not require the use of spectral theorem

other characterizations of svd the proof of the above theorem gives us two more characterizations of singular values and singular vectors: (i) in terms of eigenvalues and eigenvectors of an (m + n) (m + n) Hermitian matrix: [ A A ] = 1 [ U U V V ] [ Σ Σ (ii) in terms of a coupled system of equations { Av = σu, A u = σv ] [ U V ] U V the following is yet another way to characterize them in terms of eigenvalues/eigenvectors of an m m Hermitian matrix and an n n Hermitian matrix Lemma 1 The suqare of the singular values of a matrix A are eigenvalues of AA and A A The left singular vectors of A are the eigenvectors of AA and the right singular vectors of A are the eigenvectors of A A Proof From the relationships Ay = σx, A x = σy, we obtain A Ay = σ y, AA x = σ x Therefore, if ±σ are eigenvalues of W, then σ is an eigenvalue of both AA and A A Also AA = (UΣV )(V Σ T U ) = UΣΣ T U, A A = (V Σ T U )(UΣV ) = V Σ T ΣV Note that Σ = Σ T since singular values are real The matrices Σ T Σ and ΣΣ T are respectively n n and m m diagonal matrices with diagonal elements σi and the svd is something like a swiss army knife of linear algebra, matrix theory, and numerical linear algebra, you can do a lot with it over the next few sections we will see that the singular value decomposition is a singularly powerful tool once we have it, we could solve just about any problem involving matrices given A C m n and b im(a) C m, find all solutions of Ax = b given A GL(n), find A 1 given A C m n, find A and A F given A C m n, find A σ,p,k for p [, ] and k N given A C n n, find det(a) given A C m n, find A given A C m n and b C m, find all solutions of the min x C n Ax b the good news is that unlike the Jordan canonical form, the svd is actually computable there are two main methods to compute it: Golub-Reinsch and Golub-Kahan, we will look at these briefly later, right now all you need to know is that you can call Matlab to give you the svd, both the full and compact versions in all of the following we shall assume that we have the full svd of A = UΣV furthermore σ 1 σ σ r > are the singular values of A and r = rank(a) 3 solving linear systems assuming that we want to solve a linear system Ax = b that is known to be consistent, ie b im(a) first we form the matrix-vector product c = U b let y = V x 3

then Ax = b becomes Σy = c since Ax = b is consistent, so σ 1 σ r y 1 y r y r+1 y n c 1 = c n c r c r+1 is consistent, which means that we must have c r+1 = = c m =, ie c 1 c c = r now let where y r+1,, y n are free parameters all solutions of Ax = b are given by c 1 /σ 1 c y = r /σ r y r+1 y n x = V y for some choices of y r+1,, y n to see this, observe that σ 1 Ax = UΣV V y = UΣy = U σ r c 1 /σ 1 c 1 c r /σ r c = U r = UU y r+1 b = b y n 4 inverting nonsingular matrices note that only square matrices could have inverses, when we say A is nonsinular or invertible, the fact that A is square is implied by the way, the set of all nonsingular matrices in C n n forms group under matrix multiplication it is called the general linear group and denoted GL(n) := {A C n n : det(a) } if A C n n is nonsingular, then rank(a) = n and so σ 1,, σ n are all non-zero 4

so where A 1 = (UΣV ) 1 = (V ) 1 Σ 1 U 1 = V Σ 1 U Σ 1 = diag(σ 1 1,, σ 1 n ) 5 computing matrix -norm and F -norm recall the definition of the matrix -norm, A = max x Ax x which is also called the spectral norm we examine the expression Ax = x A Ax the matrix A A is Hermitian and positive semidefinite, ie x (A A)x for all nonzero x C n exercise: show that if a matrix M is Hermitian positive semidefinite, then its evd and svd coincide as such, A A has svd given by A A = V ΣV where V is a unitary matrix whose columns are the eigenvectors of A A, and Σ is a diagonal matrix of the form σ 1 σn where each σ i is nonnegative and an eigenvalue of A A these eigenvalues can be ordered such that σ 1 σ σ r >, σ r+1 = = σ n =, where r = rank(a) let x C n and let w = V x, then we obtain Ax x = x A Ax x x = x V ΣV x x V V x = w Σw w w n = σ i w i n w i σ1 exercise: show that if a 1,, a n are nonnegative numbers, then a 1 x 1 + + a n x n max = max(a 1,, a n ) x i x 1 + + x n where the maximum is taken over x 1,, x n [, ) not all it follows that Ax x σ 1 for all nonzero x 5

since V is a unitary matrix, it follows that there exists an x such that 1 w = V x = = e 1 in which case x A Ax = e 1Σe 1 = σ 1 in fact, this vector x is the eigenvector of A A corresponding to the eigenvalue σ 1 we conclude that A = σ 1 note that we have also shown that A = ρ(a A) since the eigenvalues of A A are simply the squares of the singular values of A by Lemma 1 another way to arrive at this same conclusion is to use the fact in Homework 1 that the -norm of a vector is invariant under multiplication by a unitary matrix, ie if Q Q = I, then x = Qx, from which it follows that A = UΣV = Σ = σ 1 by the same problem in Homework 1, the Frobenius norm is also unitarily invarint this yields an expression in terms of singular values A F = UΣV F = Σ F = σ1 + + σ r where r = rank(a) 6 computing other matrix norms the Schatten and Ky Fan norms can be expressed in terms of singular values this is cheating a bit, because we will in fact define them in terms of singular values for any p [1, ], the Schatten p-norm of A C m n is 1/p min(m,n) A σ,p := σ i (A) p and for p =, we have A σ, = max{σ 1 (A),, σ min(m,n) (A)} of course, the sum will stop at r = rank(a) since σ r+1 (A) = = σ min(m,n) (A) = as usual, we also have lim p A σ,p = A σ, (61) for all A C m n for the special values of p = 1,,, we have p = 1: nuclear norm A σ,1 = σ 1 (A) + + σ min(m,n) (A) = A p = : Frobenius norm A σ, = σ 1 (A) + + σ min(m,n) (A) = A F 6

p = : spectral norm A σ, = max{σ 1 (A),, σ min(m,n) (A)} = σ 1 (A) = A in fact one may also define Schatten p-norm for values of p [, 1) by dropping the root 1/p outside the sum A σ,p := min(m,n) σ i (A) p these will not be norms in the usual sense of the word because they do not satisfy the triangle inequality but they satisfy an analogue of (61) lim A σ,p = rank(a) p we may regard matrix rank as the Schatten -norm since A σ, = σ 1 (A) + + σ min(m,n) (A) = rank(a) provided we define := Schatten p-norm for p < 1 are useful in the same way l p -norms are useful for p < 1, namely, as continuous surrogates of matrix rank (vector l p -norms for p < 1 are often used as continuous surrogates as the l -norm, ie a count of the number of nonzero entries), which is a discontinuous function on C m n the nuclear norm, ie Schatten 1-norm, being convex in addition to continuous, is the most popular surrogate for matrix rank for any p [1, ) and k N the Ky Fan (p, k)-norm of A C m n is [ k clearly for any A C m n, A σ,p,k := σ i (A) p ] 1/p A σ,p, = A σ,p,min(m,n) = A σ,p and so Ky Fan norms generalize Schatten norms 7 computing magnitude of determinant note that determinants are only defined for square matrices A C n n recall the formula n det(a) = λ i (A) (71) where λ i (A) C is the ith eigenvalue of A recall that every n n matrix has exactly n eigenvalues counted with multiplicity by convention we usually order eigenvalues in decreasing order of magnitudes, ie λ 1 (A) λ (A) λ n (A) now if we have only the singular values of A we cannot get det(a) like we did with its eigenvalues in (71) however we can get the magnitude of the determinant 1 det(a) = det(uσv ) = det(u) det(σ) det(v ) = det(σ) = det σ since det(u) = det(v ) = 1 7 σ n = n σ i (A)

exercise: show that all eigenvalues of a unitary matrix U must have absolute value 1 and so det(u) = 1 by (71) 8 computing pseudoinverse as we mentioned above, only square matrices A C n n may have an inverse, ie X C n n such that AX = XA = I if such an X exists, we denote it by A 1 exercise: show that A 1, if exist, must be unique can we define something that behaves like an inverse (we will say exactly what this means in the next lecture when we discuss minimum length least squares problem) for square matrices that are not invertible in the traditional sense? more generally, can we define some kind of inverse for rectangular matrices? such considerations lead us to the notion of pseudoinverse the most famous one is the Moore Penrose pseudoinverse Theorem (Moore Penrose) For any A C m n, there exists a X C n m satisfying (i) (AX) = AX, (ii) (XA) = XA, (iii) XAX = X, (iv) AXA = A Furthermore the X satisfying these four conditions must be unique and is denoted by X = A other types of pseudoinverse, sometimes also called generalized inverse, may be defined by choosing a subset of these four properties the Moore Penrose theorem is actually true over any field but for C (and also R) we can say a lot more in fact we can compute A explicitly using the svd easy fact: if A C n n is invertible, then A 1 = A (81) to show this, just see that all four properties in the theorem hold true if we plug in X = A 1 another easy fact: if D = diag(d 1,, d min(m,n) ) C m n is diagonal in the sense that d ij = for all i j, then where D = diag(δ 1,, δ min(m,n) ) C n m (8) δ i = { 1/d i if d i if d i = to show this, just see that all four properties in the theorem hold true if we plug in X = diag(δ 1,, δ min(m,n) ) in general (AB) B A, AA I, A A I we can get A in terms of the svd of A C m n : if A = UΣV, then A = V Σ U (83) 8

where Σ = σ1 1 σ 1 r C n m to see this, just verify that (83) satisfies the four properties in Theorem we may also deduce from (83) that A = (A A) 1 A if A has full column rank (r = n) and that if A has full row rank (r = m) A = A (AA ) 1 9 solving least squares problems given A C m n and b C m, the least squares problem ask to find x C n so that b Ax is minimized note that x minimizes b Ax iff it minimizes b Ax, so whether we write the squared or not doesn t really matter using svd of A and the unitary invariance of the vector -norm in Homework, we can simplify this minimization problem as follows b Ax = b UΣV x = U b ΣV x = c Σy = (c 1 σ 1 y 1 ) + + (c r σ r y r ) + c r+1 + + c m where c = U b and y = V x so we see that in order to minimize Ax b, we must set y i = c i /σ i for i = 1,, r the unknowns y i, for i = r + 1,, m, can have any value, since they do not affect the value of c Σy so all the least squares solution are of the form c 1 /σ 1 c x = V y = V r /σ r y r+1 where y r+1,, y n C are arbitrary our analysis above also shows that the minimum value of b Ax is given by y n min x C n b Ax = c r+1 + + c m 9

1 solving minimum length least squares problems one of the best-known applications of the svd is that it can be used to obtain the solution to the problem b Ax = min, x = min alternatively, we can write { } min x : x argmin b Ax x C n or using the normal equations notation: f : X R, for S X, write for the set of minmizers, ie min { x : A Ax = A b} argmin f(x) or argmin{f(x) : x S} x S argmin f(x) = x S { } x S : f(x ) = min f(x) x S we often write (sloppily) x = argmin f(x) or argmin f(x) = x to mean that x is a minimizer of f over S even though the proper notation ought to have been x argmin f(x) note that from the last part of the last lecture that if A does not have full rank, then there are infinitely many solutions to the least squares problem because the residual in does not depend on y r+1,, y m and these are thus free parameters that we may freely choose: c 1 /σ 1 c x = V y = V r /σ r (11) y r+1 y n where c = U b we can claim that the unique solution with minimum -norm is given by setting y r+1 = = y m =, ie c 1 /σ 1 c x + = V r /σ r (1) where c = U b this follows because x + = c 1 + + σ 1 c r σ r c 1 σ 1 + + whatever our choice of y r+1,, y n in (11) 1 c r σ r + y r+1 + + y n = x

we may also write (1) in terms of the Moore Penrose pseudo inverse since Σ has precisely the right form x + = V Σ c = V Σ U b = A b Σ = σ1 1 σ 1 r 11