THE SINGULAR VALUE DECOMPOSITION AND LOW RANK APPROXIMATION

Size: px

Start display at page:

Download "THE SINGULAR VALUE DECOMPOSITION AND LOW RANK APPROXIMATION"

Cameron Shields
5 years ago
Views:

1 THE SINGULAR VALUE DECOMPOSITION AND LOW RANK APPROXIMATION MANTAS MAŽEIKA Abstract. The purpose of this paper is to present a largely self-contained proof of the singular value decomposition (SVD), and to explore its application to the low rank approximation problem. We begin by proving background concepts used throughout the paper. We then develop the SVD by way of the polar decomposition. Finally, we show that the SVD can be used to achieve the best low rank approximation of a matrix with respect to a large family of norms. Contents 1. Background Concepts Self-adjoint Linear Maps The Spectral Theorem Direct Sums & Eigenspaces Singular Values 4 2. The Polar Decomposition The Polar Decomposition for Endomorphisms The General Polar Decomposition 8 3. The Singular Value Decomposition The SVD for Endomorphisms The General SVD 9 4. The Low Rank Approximation Problem Matrix Norms & Their Generalizations The Min-max Theorem & Weyl s Inequality for Singular Values Low Rank Approximation Discussion 16 Acknowledgments 16 Bibliography Guide 16 References 16 Date: September 3. Contact: mantas@uchicago.edu. 1

2 2 MANTAS MAŽEIKA 1. Background Concepts Notation 1.1. Unless otherwise stated, throughout this section U and V refer to finite-dimensional Hermitian inner product spaces. Notation 1.2. Suppose we have bases u = {u 1, u 2,..., u n } and v = {v 1, v 2,..., v m } of U and V respectively. Given a vector x U, we denote the coordinate vector of x with respect to u by x 1 x 2 [x] u :=.. x n For a linear map f : U V, we denote the matrix of f with respect to u and v by [f] u,v := [f(u 1 )] v [f(u 2 )] v... [f(u 2 )] v Self-adjoint Linear Maps. Theorem 1.3. For all linear maps f : U V, there exists a unique g : V U such that x U, y V : f(x), y U = x, g(y) V. Proof. Let f : U V be a linear map, let u = {u 1, u 2,..., u n } be an orthonormal basis for U, and let v = {v 1, v 2,..., v n } be an orthonormal basis for V. Let x U, and let y V. With an asterisk denoting the conjugate transpose of a matrix, we have f(x), y V = ([f] u,v [x] u ) [y] v = [x] u[f] u,v[y] v. We want to find a matrix G, which can be interpreted as a linear map with respect to u and v, such that [x] u(g[y] v ) = [x] u[f] u,v[y] v. Recall that, for matrices G and F, if Gx = F x for all vectors x, then we have G = F. Therefore, G must equal [f] u,v. The unique linear map g is equal to the matrix G interpreted as a linear map with respect to the bases u and v. For readability, we will omit subscripts on inner products in the rest of this paper. Definition 1.4. Let f : U V be a linear map. We define the adjoint of f as the unique linear map f : V U satisfying x U, y V : f(x), y = x, f (y). If f = f, we say f is self-adjoint. Note that in this case, U and V must be the same. Additionally, when f : V V is self-adjoint and V is a Hermitian inner product space, we say f is Hermitian. We will now state some properties of the adjoint operator, which can be proven by converting to matrices and proving the properties for the conjugate transpose. Proposition 1.5. Let f : U V, g : U V, and h: V W be linear maps. Then we have (f + g) = f + g. (f h) = h f. We will also make use of the following property of self-adjoint operators. Proposition 1.6. If f : V V is self-adjoint, then it has real eigenvalues. Proof. Let λ be an eigenvalue of f, and let v be an associated eigenvector. λ v, v = f(v), v = v, f(v) = v, v λ Since v 0, we have v, v > 0, so λ = λ. Therefore, λ R.

3 THE SINGULAR VALUE DECOMPOSITION AND LOW RANK APPROXIMATION The Spectral Theorem. Theorem 1.7. If f : V V is Hermitian, then f has an orthonormal eigenbasis. That is, there exists an orthonormal basis of V consisting of eigenvectors of f. We denote this basis u = {u 1, u 2,..., u n }, indexed such that the associated eigenvalues are ordered from greatest to least: λ 1 λ 2 λ n. Proposition 1.8. If f : V V has an orthonormal eigenbasis and all the eigenvalues of f are real, then f is Hermitian. Proof. Let u = {u 1, u 2,..., u n } be the orthonormal eigenbasis. Then [f] u,u is a real diagonal matrix, so its conjugate transpose [f] u,u equals [f] u,u. This is only the case if f = f Direct Sums & Eigenspaces. Definition 1.9. For a vector space A and subsets A 1, A 2,..., A n A, we define the sum of the A i as { n n } A i := a i a 1 A 1, a 2 A 2,..., a n A n. Definition Let A be a vector space over a field F, and let A 1, A 2,..., A n be subspaces of A. We say that A is a direct sum of the A i if all x A can be written uniquely as a sum n a i, where each a i is an element of the corresponding A i. To denote that A is a direct sum of the A i, we write n A = A i. The following are a few well-known properties of the direct sum. For their proofs, see [2]. Propositions (1) A vector space A is a direct sum of A 1, A 2,..., A k A if we have A i A j = {0} and A = A j. j i (2) If A is a direct sum of A 1, A 2,..., A n A, then dim A = n dim A i. (3) If B is a subspace of a Hermitian inner product space A, then B B = A. We will now prove a few small lemmas that will be used in proving a lemma to the polar decomposition. Proposition Let C be a Hermitian inner product space, and let A, B C such that A B = C and A B. Then, A = B. Proof. Let x C such that x, y = 0 for all y B. We know x = a + b for a A and b B. Suppose for contradiction b 0. Then, we have x, b = a, b + b, b = 0 + v, v > 0. This is a contradiction, so b = 0, and we know x A. Therefore, A = B. Proposition Let f : V V be a Hermitian linear map, and let E i denote the eigenspace for the eigenvalue λ i. Then, E i E j for j i. Proof. Let x E i and let y E j with j i. Then we have x, y λ j = x, f(y) = f(x), y = λ i x, y = λ i x, y. This gives us (λ i λ j ) x, y = 0. Since λ i λ j, we know x, y = 0. Therefore x E j, and E i E j. Proposition Let f : V V be a Hermitian linear map, where dim(v ) = n. Then V is a direct sum of the eigenspaces of f.

4 4 MANTAS MAŽEIKA Proof. We know f has k eigenspaces, which are perpendicular to each other, and whose dimensions sum to n. Let i [k]. Any nonzero element of j i E j represents a sum of eigenvectors with coefficient 1. These eigenvectors and any v E i are linearly independent, so their sum cannot equal v. Therefore, E i j i E j = {0}. We can find orthonormal bases for each eigenspace separately. Put together, these bases will form a set of cardinality n of orthonormal vectors in V. Therefore, they must be an orthonormal basis of V, which tells us that V is the sum of the eigenspaces of f. By Proposition we know V is a direct sum of the eigenspaces of f. Though we only require the previous two propositions for Hermitian linear maps, they also apply to normal linear maps, for which f f = f f. Proposition Let f : V V and g : V V be Hermitian linear maps. Suppose {x U f(x) = λx} is an eigenspace of f if and only if {x U g(x) = λx} is an eigenspace of g. Then f = g. Proof. By the spectral theorem, f has an orthonormal eigenbasis u := {u 1, u 2,..., u n } that uniquely defines the action of f on V. By the premises, these are eigenvectors of g to the same eigenvalues. Since these vectors form a basis, they uniquely define the action of g on U as the same of f. Therefore, f = g Singular Values. Proposition For any linear map f : U V, the compositions f f and f f self-adjoint. are Proof. Let f : U V be a linear map. For all u U and v V, we have (f f)(u), v = f (f(u)), v = f(u), f(v) = u, f (f(v)) = u, (f f)(v). Thus, f f is self-adjoint. Showing that f f is self-adjoint follows analogously. Proposition If f : U V is a linear map, then f f and f f have non-negative, real eigenvalues. Proof. Since f f and f f are Hermitian, they have real eigenvalues. Let λ be an eigenvalue of f f, and let v be an associated eigenvector. We have v, v λ = v, λv = v, (f f)(v) = f(v), f(v). Since the Hermitian inner product is positive definite, we know f(v), f(v) is non-negative, and v, v is positive, because v is not equal to zero by definition. Therefore, we can divide by v, v, and we know λ must be non-negative. The proof for f f is analogous. Note that this is equivalent to the statement that f f and f f are positive semi-definite. In other words, the inner product v, (f f)(v) is real and non-negative for all v U, and likewise for f f.

5 THE SINGULAR VALUE DECOMPOSITION AND LOW RANK APPROXIMATION 5 Lemma If f : U V is a linear map, then we have ker(f) = ker(f f) ker(f ) = ker(f f ) ker(f) = (im(f )) ker(f ) = (im(f)) rk(f f) = rk(f) = rk(f ) = rk(f f ). Proof. Let f : U V be a linear map, with dim(u) = n and dim(v ) = m. First, we will show ker(f) = ker(f f). Let x ker(f). Then, f (f(x)) = f (x) = 0, so x ker(f f). Therefore, ker(f) ker(f f). Let x ker(f f). We have f(x), f(x) = (f f)(x), x = 0, x = 0. Since the inner product is positive definite, we know f(x) = 0, so ker(f f) ker(f). Therefore, ker(f) = ker(f f). The proof that ker(f ) equals ker(f f ) follows analogously. To see ker(f) = (im(f )), let x ker(f) and let y V. We have x, f (y) = f(x), y = 0, y = 0, so x f (y) for all y V. Therefore, x (im(f )). Now let x (im(f )). Since f (f(x)) im(f ), we have 0 = x, (f f)(x) = f(x), f(x), which is only the case when f(x) = 0, so x ker(f). Therefore, ker(f) = (im(f )). The proof that ker(f ) equals (im(f)) follows analogously. The nullity of f and f f are equal, and likewise for f and f f. By the rank-nullity theorem, we know rk(f) = rk(f f) and rk(f ) = rk(f f ). All that remains is to show rk(f) = rk(f ). We proved (ker(f)) = im(f ), and by Proposition we have ker(f) (ker(f)) = U. Therefore, by Proposition we have dim(ker(f)) + dim(im(f )) = n. By the rank-nullity theorem, dim(im(f)) = dim(im(f )), or rk(f) = rk(f ). The following theorem is an important prerequisite for the definition of singular values. Theorem If f : U V is a linear map of rank r, then there are exactly r nonzero eigenvalues of f f, and r nonzero eigenvalues of f f. Proof. By Lemma 1.18, we know f and f f have rank r. By the rank nullity theorem, the dimension of ker(f) and ker(f f) is n r. Since the kernel of a function is equal to its eigenspace associated with the eigenvalue 0, we know the algebraic multiplicity of the eigenvalue 0 for f f is n r. Since f f is Hermitian, it is diagonalizable, so its eigenvalues have the same algebraic and geometric multiplicity. Therefore, there are r nonzero, and hence positive, eigenvalues of f f. The same follows by analogous reasoning for f f. Definition The singular values of a linear map f : U V are the strictly positive square roots of the eigenvalues of f f, denoted as σ 1 σ 2 σ r > 0. Remark The eigenvalues of f f and f f are in fact equal, with the same multiplicity. We will see a proof of this later in the paper with Proposition The Polar Decomposition In this section, we will prove the polar decomposition for linear maps between Euclidean spaces of equal dimension. The generalization to arbitrary linear maps is proved analogously, but introduces complexities unnecessary for a conceptual understanding of the topic. It is stated without proof at the end of this section. The structure of the proofs in this section, of Lemma 1.18, and of the proofs in section three, comes from [3].

6 6 MANTAS MAŽEIKA Lemma 2.1. Let V be a Hermitian inner product space of dimension n. Let f : V V be a positive semi-definite, self-adjoint linear map. Then, there exists a unique, positive semi-definite, self-adjoint linear map h: V V such that h h equals f. Furthermore, if µ 1, µ 2,..., µ p are the distinct eigenvalues of h, and E i is the eigenspace associated with µ i, then µ 2 1, µ 2 2,..., µ 2 p are the distinct eigenvalues of f, and E i is the eigenspace associated with µ 2 i. Finally, ker(f) = ker(h). Proof. By the spectral theorem, f has an orthonormal eigenbasis u := {u 1, u 2,..., u n } with associated eigenvalues λ 1, λ 2,..., λ n. Since f is positive semi-definite, its eigenvalues are non-negative, which lets us define σ i as the positive square root λ i. We now define a linear map h: V V as mapping each u i to σ i u i. This gives us h h = f, and that the squared eigenvalues of h equal the eigenvalues of f, so h is positive semi-definite. What s more, h has an orthonormal eigenbasis by its definition, and since the eigenvalues of f are real, the eigenvalues of h are real. By Proposition 1.8, h is self-adjoint. Since u is an orthonormal eigenbasis to both f and h, we know σ i has the same multiplicity as λ i for each i. Therefore, if µ 1, µ 2,..., µ p are the distinct eigenvalues of h, then µ 2 1, µ 2 2,..., µ 2 p are the distinct eigenvalues of f. We now consider the eigenspaces E i := {x V h(x) = µ i v} E i := {x V f(x) = µ2 i v}. Fix k [p]. To see E k E k, let x E k. We have h(x) = µ i v, so (h h)(x) = µ 2 i x. Since f = h h, this gives us x E i. For the other inclusion, since h is Hermitian, we know by the propositions in 1.11 E i E k = V and (E k ) E k = V. i k By Proposition 1.12, we have (E k ) = E i. i k Thus, we want to show E k E i. i k Fix x E k, and let y be in the direct sum of the E i for i k. We have y = i k y i for y i E i. Since we have already proven E i E i, we know y i E i. Therefore, we have x, y = x, i k y i = i k x, y i = i k 0 = 0 (E k E i ) Therefore, x is perpendicular to y, so x is perpendicular to the direct sum of the E i for i k. In other words, x is in the orthogonal complement of this direct sum, which is just E k. Therefore, E k = E k. Clearly, ker(h) ker(f). To see the other inclusion, let x ker(f) such that x 0. Then x must be an eigenvector of f to eigenvalue 0. As we just proved, it is also an eigenvector of h to eigenvalue 0, so x ker(h). Therefore, ker(h) = ker(f). All that remains is to show uniqueness. Suppose there exists a self-adjoint linear map g : V V such that g g = f. By the spectral theorem, g has an orthonormal eigenbasis. By the same argument that was used with h, we know g and f have the same eigenspaces, so g and h must have the same eigenspaces. By Proposition 1.15, g = h.

7 THE SINGULAR VALUE DECOMPOSITION AND LOW RANK APPROXIMATION The Polar Decomposition for Endomorphisms. Theorem 2.2 (The Polar Decomposition for Endomorphisms). Let V be an n-dimensional Hermitian inner product space. For any linear map f : V V, there are two positive semi-definite, self-adjoint linear maps h 1 : V V and h 2 : V V, and a unitary linear map g : V V such that f = g h 1 = h 2 g. Furthermore, if f has rank r, the maps h 1 and h 2 have the same positive eigenvalues σ 1, σ 2,..., σ r, which are the singular values of f. Finally, h 1 and h 2 are unique, g is unique if f is invertible, and h 1 = h 2 if f is normal. Proof. The composition f f is a positive semi-definite, self-adjoint linear map, so by Lemma 2.1 there exists a unique positive semi-definite, self-adjoint linear map h 1 : V V satisfying h 1 h 1 = f f. In the lemma, we also specified the existence of an orthonormal eigenbasis u := {u 1, u 2,..., u n } for h 1. Moreover, the indices are arranged such that µ 1 µ 2 µ n, where µ i is the eigenvalue associated with u i. Let r be the rank of f. By Theorem 1.19, we know µ 1, µ 2,..., µ r are the singular values of f, and µ r+1, µ r+2,..., µ n are all zero. We now wish to find a g such that f = g h 1. A hint for where to start comes if we suppose h 1 is invertible. Then, we expect f h 1 1 to equal g. Thinking back to how h 1 acts on the u i, we see that h 1 1 (u i) would equal u i /µ i, for positive µ i. Therefore, we expect g(u i ) to equal f(u i )/µ i for positive µ i. Taking advantage of this knowledge, we can define a new orthonormal basis of V. For i [r], we let v i = f(u i )/µ i. These vectors are orthonormal already. To see this, let i, j [r]. v i, v j = 1 µ i µ j f(u i ), f(u j ) = 1 µ i µ j (f f)(u i ), u j = 1 µ i µ j µ 2 i u i, u j = µ i µ j δ ij = δ ij. Therefore, we can extend the v i to an orthonormal basis of V. Let g be defined as the linear map sending u i to v i. Since g maps one orthonormal basis to another, it is unitary. Defining g in this way also leads to the desired equality f = g h 1. To see this, let i [r]. (g h)(u i ) = g(µ i u i ) = µ i g(u i ) = µ i v i = f(u i ) For r < i n, we have f(u i ) 2 = f(u i ), f(u i ) = (f f)(u i ), u i = 0, u i = 0, so f(u i ) must equal the zero vector. This gives us (g h)(u i ) = g(0) = 0 = f(u i ). Therefore (g h)(u i ) = f(u i ) for all i [n], and therefore for all vectors in V. Now we will turn to h 2. Because g is unitary, it must be invertible. Defining h 2 as f g 1, we obtain f = g h 1 = h 2 g. Therefore, we have g h 1 g 1 = h 2. The adjoint of g h 1 g 1 is equal to itself, so h 2 is self-adjoint. Also, from the definition of v i, we know that for i [r], we have f(u i ) = µ i v i. For r < i n, we have h 2 (v i ) = f(u i ) = 0 = µ i v i. Therefore, h 1 and h 2 have the same positive eigenvalues, which are the singular values of f. To see that h 2 is unique, suppose there is a positive semi-definite, self-adjoint linear map h 3 such that f = h 3 g. We have f g 1 = h 2 and f g 1 = h 3. Therefore, h 2 h 2 = h 2 h 2 = f f, and likewise h 3 h 3 = f f. By Lemma 2.1, we can deduce that h 2 is the unique positive semi-definite, self-adjoint linear map such that h 2 h 2 = f f, and likewise for h 3. Therefore, h 2 = h 3. Finally, if f is invertible, then it has full rank, so its nullity is zero. By Lemma 2.1, we have ker(f) = ker(h 1 ), so the nullity of h 1 is zero, so h 1 has full rank and is invertible. We can then write g = f h 1 1. Since h 1 is unique, we know g is unique. If f is normal, then we have f f = f f, so h 1 h 1 = h 2 h 2. The polar decomposition can be stated in matrix form as follows.

8 8 MANTAS MAŽEIKA Theorem 2.3. For every complex n n matrix A, there is an n n unitary matrix R and an n n positive semi-definite, Hermitian matrix S such that A = RS. Furthermore, R and S are unique if A is invertible The General Polar Decomposition. Let U and V be Hermitian inner product spaces of dimensions n and m respectively. Definition 2.4. A linear map f : U V is weakly unitary if it has rank s = min(m, n), and if f f = id on (ker(f)). Theorem 2.5 (The Polar Decomposition). For any linear map f : U V, there are two positive semi-definite, self-adjoint linear maps h 1 : U U and h 2 : V V, and a weakly unitary linear map g : U V such that f = g h 1 = h 2 g. Furthermore, if f has rank r, the maps h 1 and h 2 have the same positive eigenvalues σ 1, σ 2,..., σ r, which are the singular values of f. Finally, h 1 and h 2 are unique, g is unique if the rank of f is min{m, n}, and h 1 = h 2 if f is normal The SVD for Endomorphisms. 3. The Singular Value Decomposition Lemma 3.1. Let V be an n-dimensional Hermitian inner product space. For any linear map f : V V, there exist two orthonormal bases u = {u 1, u 2,..., u n } and v = {v 1, v 2,..., v n } such that [f] u,v is diagonal, and has the form µ 1 µ 2... µn where the first r of the µ i are equal to σ 1, σ 2,..., σ r, the singular values of f, and the remaining n r of the µ i are zero. Proof. In the proof of the polar decomposition, we specified two orthonormal bases u = {u 1, u 2,..., u n } and v = {v 1, v 2,..., v n }. Let r be the rank of f. Then, for i [r], we have f(u i ) = µ i v i. Expressing this as a coordinate vector with respect to the basis v gives µ i in the i th coordinate, and zero everywhere else. For i greater than r, we have f(u i ) = µ 2 i u i = 0, so the coordinate vector with respect to te basis v is the zero vector. Therefore, we have µ 1 µ 2 [f] u,v =.... µn Theorem 3.2. Let M be an n n complex matrix. There exist two n n unitary matrices U and V, and a positive semi-definite diagonal matrix D such that Furthermore, D is of the form M = UDV. µ 1 µ 2... µn

9 THE SINGULAR VALUE DECOMPOSITION AND LOW RANK APPROXIMATION 9 where the first r of the µ i are equal to σ 1, σ 2,..., σ r, the singular values of M, and the remaining n r of the µ i are zero. Proof. Let u and v be defined as before. By considering M as the linear map m with respect to the standard basis e, we see that this becomes a change of bases problem. Namely, we are trying to convert [m] u,v to [m] e,e. This can be obtained by considering the change of basis transformations σ : C n C n and τ : C n C n, defined as σ(u i ) = e i τ(v i ) = e i. The rank of both σ and τ is equal to n, so they are invertible, and we have σ 1 (e i ) = u i τ 1 (e i ) = v i. We see then that [σ 1 ] e,e is the unitary matrix with column vectors, from left to right, [u 1 ] e, [u 2 ] e,..., [u n ] e. Likewise, [τ 1 ] e,e is the unitary matrix with column vectors, from left to right, [v 1 ] e, [v 2 ] e,..., [v n ] e. These matrices are equal to [σ] 1 e,e and [τ] 1 e,e, respectively. Therefore, if we let U := [σ] e,e, and V := [τ] e,e, and D := [m] u,v, we can use change of bases to obtain 3.2. The General SVD. M = [m] e,e = [σ] e,e [m] u,v [τ] 1 e,e = UDV. Lemma 3.3. Let U and V be Hermitian inner product spaces of dimension n and m respectively. Let f : U V be a linear map with rank r, and let s = min(m, n). Then, there exist two orthonormal bases u = {u 1, u 2,..., u n } and v = {v 1, v 2,..., v m } such that [f] u,v is a matrix of the form µ 1 µ 2... µ 1 D 1 = µn µ 2 or D 2 =... 0 µm 0 where the first r of the µ i equal σ 1, σ 2,..., σ r, the singular values of f, and the remaining s r of the µ i are zero. Theorem 3.4. Let M be an m n complex matrix. There exists an m m unitary matrix U, an n n unitary matrix V, and an m n positive semi-definite rectangular diagonal matrix D such that M = UDV. Furthermore, D is of one of the forms specified in 3.3, depending on which of m and n is greater. Before moving on, we will return to proving that f f and f f have the same eigenvalues with the same multiplicity, and hence that f and f have the same singular values with the same multiplicity. Proposition 3.5. Let f : U V be a linear map for Hermitian inner product spaces U and V. Then f f and f f have the same eigenvalues with the same multiplicity. Proof. By Theorem 3.4, we have [f] e,e = UDV, where U is an m m unitary matrix, V is an n n unitary matrix, and D is an m n rectangular diagonal matrix. Let F denote [f] e,e. We have F F = V D U UDV = V D T U 1 UDV 1 = V D T DV 1.

10 10 MANTAS MAŽEIKA Analogously, we have F F = V DD T V 1. In other words, F F is unitarily similar to D T D, and F F is unitarily similar to DD T. Therefore, F F and D T D share the same eigenvalues with the same multiplicity, and likewise with F F and DD T. Since D T D and DD T visibly share the same eigenvalues with the same multiplicity, we can conclude that F F and F F share the same eigenvalues with the same multiplicity. 4. The Low Rank Approximation Problem We will now explore an application of the SVD to low rank approximation of matrices. The goal of low rank approximation is to find a matrix B that is closest to an initial matrix A such that the rank of B is less than or equal to q N. In other words, given a matrix norm and a q N, we wish to minimize A B subject to the constraint rk(b) q. The traditional solution obtained from the SVD, where A = UDV, is to set B equal to B q := UD q V, where the smallest s q entries of D are zeroed out to obtain D q. An interesting question is how exactly B q attains closeness to A. Is it the closest low rank approximation with respect to all matrix norms? If not all matrix norms, then which subset? To start, we can notice that B q is essentially the same as A, but with the smallest s q singular values set to zero. Therefore, we expect low rank approximation to hold for norms for which B A measures closeness of the largest singular values of A and B Matrix Norms & Their Generalizations. Notation 4.1. Unless otherwise stated, throughout this section we let A be an m n complex matrix of rank r, with s := min(m, n). We let σ i (A) denote the i th singular value of A if i [r], and if r < i s, we let σ i (A) equal 0. Additionally, we let λ i (A) denote the i th eigenvalue of A, where λ 1 (A) λ 2 (A) λ s (A). We will use similar notation to write the singular values and eigenvalues of other matrices. Definitions 4.2. For p, k N such that k s, we have the following definitions. Schatten p-norm ( s ) 1 p S A p := σ p i (A) Ky Fan k-norm K A k := σ i (A) Notice that the Schatten p-norm is an L p norm of the singular values, which is a smooth measure of the largest singular values, biasing towards larger singular values as p goes to infinity. On the other hand, the Ky Fan k-norm is a discrete measure of the k largest singular values. In other words, these norms are the norms we would expect to work with B q. The norm for which we will prove the low rank approximation theorem is the natural generalization of the Schatten p-norm and Ky Fan k-norm. 1 Namely, for p, k N such that k s we define A (p,k) := ( σ p i (A) ) 1 p. We will refer to this norm as the (p, k) norm, but we have yet to prove that it is, in fact, a norm. This becomes easier with the help of a few trace identities. Before proving these identities, however, we require some knowledge about sums of eigenvalues. Definition 4.3. Let A be an n n Hermitian matrix, and let k [n] be given. Let V be an m-dimensional subspace of C n. We define the partial trace tr(a V ) as m tr(a V ) := vi Av i, where v 1, v 2,... v m are any orthonormal basis of V. 1 For more information regarding these three norms, see [4] and [8]

11 THE SINGULAR VALUE DECOMPOSITION AND LOW RANK APPROXIMATION 11 The partial trace is invariant to the choice of orthonormal basis. This can be shown by writing one orthonormal basis as another orthonormal basis times a unitary matrix, and solving for the partial trace. Therefore, the partial trace is well-defined. Proposition 4.4 (Extremal partial trace). Let A be an n n Hermitian matrix, and let k [n] be given. We have λ i (A) = sup tr(a V ). V C n dim(v )=k A clear proof of the extremal partial trace identity can be found in [7], so we will omit it here. Corollary 4.5 (Schur-Horn inequality). Let A be an n n Hermitian matrix, and let k [n] be given. Let {i 1, i 2,..., i k } [n] and let a i i denote the i th diagonal element of A. We have a i1i 1 + a i2i a ik i k λ 1 (A) + λ 2 (A) + λ k (A). Proof. Let u = {u 1, u 2,..., u n } be an orthonormal eigenbasis of A, and let V C n be defined as the direct sum of k of the spans of the u i. We have tr(a V ) = λ i1 (A) + λ i2 (A) + λ ik (A) for {i 1, i 2,..., i k } [n], which is less than or equal to the supremum of tr(a V ) over all k- dimensional V. Definition 4.6. A projection is a linear transformation ρ: V V such that ρ ρ = ρ. A projection is orthogonal if V has an inner product and the projection s range and null space are orthogonal with respect to this inner product. One can show that a projection is orthogonal if and only if it is self-adjoint. A good proof of this can be found in [6], so we will leave it unproven here. Proposition 4.7. Let A refer to the unique Hermitian matrix such that A 2 = A A. Let P k be the matrix space of rank k orthogonal projections. The following identities hold. S A p = tr( A p ) 1/p (4.8) K A k = max{tr(p A )} P P k (4.9) A (p,k) = max P P k {tr(p A p ) 1/p } (4.10) Proof. First we will prove 4.8. Since A is Hermitian, we have λ i ( A p ) = λ p i ( A ). Hence, n n s tr( A p ) = λ i ( A p ) = λ p i ( A ) = σ p i (A) = S A p p. Now we will prove 4.9. Let ρ be any rank k orthogonal projection. Since ρ is self-adjoint, it has an orthonormal eigenbasis u = {u 1, u 2,..., u n }. Let P := [ρ] e,e and let P := [ρ] u,u. Let A be the matrix obtained by changing the bases of A from e to u. The only eigenvalues of a projection are 0 and 1, where the multiplicity of 1 equals the rank of the projection. Therefore, P is a diagonal matrix with 1 on k diagonals and 0 everywhere else. Therefore, the matrix A P is equal to A, but with n k columns zeroed out. By Corollary 4.5, we can conclude tr( A P ) K A k. By the properties of the trace, we also have tr( A P ) = tr(p A ) = tr(p A ). To see that K A k is obtained by tr(p A ) for some P, let v = {v 1, v 2,..., v n } be an orthonormal eigenbasis of A. Define ρ on this basis by letting ρ(v i ) equal v i for i [k], and 0 otherwise. We can see that ρ is a rank k orthogonal projection. Defining P and A in the same way as before, but on v instead of u, we can see that A P is a diagonal matrix with a trace equal to K A k. The proof of 4.10 is a straightforward combination of the proofs for 4.8 and 4.9. From the previous proposition, one can easily show that the (p, k) norm is positive definite, and that αa (p,k) = α A (p,k) holds for all scalars α C. All that remains is to prove the triangle inequality. First, however, we will introduce a theorem about symmetric gauge functions.

12 12 MANTAS MAŽEIKA In short, symmetric gauge functions are functions on the reals that act like norms. For their full definition, and a proof of the following theorem, see [5]. Notation For x, y R n, if the sum of the components of x is less than or equal to the sum of the components of y, we say x y. Theorem Let x, y R n such that x and y are element-wise non-negative. Then, x y if and only if Φ(x) Φ(y) for all symmetric gauge functions Φ. Proposition For all complex m n matrices A and B, we have A + B (p,k) A (p,k) + B (p,k). Proof. The triangle inequality for the Ky Fan k-norm can be stated as σ 1 (A + B) σ 1 (A) + σ 1 (B) σ 2 (A + B). σ 2 (A) + σ 2 (B).. σ k (A + B) σ k (A) + σ k (B) Since the L p norm is a symmetric gauge function 1, we know by Theorem 4.12 σ 1 (A + B) σ 1 (A) + σ 1 (B) σ 2 (A + B) A + B (p,k) = L p. L σ 2 (A) + σ 2 (B) p.. σ k (A + B) σ k (A) + σ k (B) By Minkowski s inequality, we have σ 1 (A) + σ 1 (B) σ 1 (A) σ 1 (A) σ 2 (A) + σ 2 (B) L p. L σ 2 (A) p. + L σ 2 (A) p. = A (p,k) + B (p,k). σ k (A) + σ k (B) σ k (A) σ k (A) 4.2. The Min-max Theorem & Weyl s Inequality for Singular Values. The proof of the low rank approximation theorem for the (p,k) norm requires a few lemmas. One of them is the singular value version of Weyl s inequality. This, in turn, requires the min-max theorem, also known as the Courant-Fischer min-max theorem. Definition Let A be an n n Hermitian matrix, and let x C n such that x 0. We define the Rayleigh quotient of A as R A (x) := x Ax x x. Note that for a vector x C n and scalar α C, we have R A (αx) = R A (x). Lemma Let V be a finite dimensional vector space, and let X, Y V. We have dim(x + Y ) + dim(x Y ) = dim(x) + dim(y ). For the sake of conciseness, we will not prove this lemma. See [2] for a full proof. Theorem 4.16 (The Min-max Theorem). Let A be an n n Hermitian matrix. Then we have ( (x)) λ i (A) = R A. min U C n dim U=n i+1 max x U x 0 1 This can be seen by comparing the properties of the Lp norm to the definition of symmetric gauge functions provided in [5].

13 THE SINGULAR VALUE DECOMPOSITION AND LOW RANK APPROXIMATION 13 Since R A (αx) = R A (x) for any scalar α, an equivalent statement of the theorem is ( Ax) λ i (A) = x. min U C n dim U=n i+1 inf U C n dim U=n i+1 max x U x =1 Proof. First, we will prove the infimum-supremum version of the theorem, starting with ( (x)) λ i (A) R A. sup x U x 0 It suffices to show that for all U C n of dimension n i + 1, there exists an x in U with x 0 such that λ i (A) R A (x). Let U C n have dimension n i + 1. Where E j is the j th eigenspace for A, we have i dim U + dim E j > n. j=1 ( i ) This is because dim E j 1 for all j [n]. Using Lemma 4.15, we can show U j=1 E j is nontrivial. Moreover, the Rayleigh quotient for any x in this intersection is greater than or equal to λ i (A). This completes the first part of the proof. Now we will show λ i (A) inf U C n dim U=n i+1 sup x U x 0 ( R A (x)). It suffices to show the existence of a U C n of dimension n i + 1 such that for all x U with x 0, we have λ i (A) R A (x). Once again, we know n dim n i + 1. j=i E j Let U be a subspace of this direct sum of dimension n i + 1. For all x U with x 0, we have R A (x) λ i (A), completing the infimum-supremum proof. Showing that the supremum is in fact a maximum can be done by noting that the supremum is taken over the unit ball, a compact set, and that the Rayleigh quotient is continuous. By the extreme value theorem, the maximum exists. Showing that the infimum is a minimum can be achieved by noting that the U specified in the proof of the second inequality attains a value for R A (x) with x 0 and x U equal to λ i. By the first inequality, this is less than or equal to to the infimum over all U with dimension n i + 1 of this quantity. Therefore, the U specified in the proof of the second inequality attains a minimum value. The proof of the following theorem is based on the proof of the same theorem in [7]. Theorem 4.17 (Weyl s Inequality for Singular Values). Let A and B be m n complex matrices, and let i and j be in [n] such that i + j 1 is in [n]. Then we have σ i+j 1 (A + B) σ i (A) + σ j (B). Proof. Let A and B be m n complex matrices. σ i (A) is equal to λ i (A A), so by the min-max theorem we have the following for any m n complex matrix C. ( Cx) σi 2 (C) = x C. min U C n dim U=n i+1 max x U x =1

14 14 MANTAS MAŽEIKA Notice that x C Cx equals (Cx) Cx. This is just the inner product Cx, Cx, or Cx 2. Therefore, we have σi 2 (C) = max ( Cx 2). x U x =1 min U C n dim U=n i+1 Since max(x 2 ) = max(x) 2 and min(x 2 ) = min(x) 2 are true for x 0, we can simplify this to ( Cx ) σ i (A) =. (4.18) min U C n dim U=n i+1 max x U x =1 We want to show σ i+j 1 (A+B) σ i (A)+σ j (B), so it suffices to show there exists a W C n of dimension n i j + 2 such that for all unit vectors x W, we have (A + B)x σ i (A) + σ j (B). By 4.18, there exists a U C n of dimension n i + 1 such that the maximum among all unit vectors x U of Ax is equal to σ i (A). Therefore, for all unit vectors x U, we have Ax σ i (A). Likewise, there exists a V C n of dimension n j + 1 such that for all unit vectors x V, we have Bx σ j (B). By Lemma 4.15, we have dim(u + V ) + dim(u V ) = dim(u) + dim(v ). Since U + V C n, we know dim(u + V ) n, so we have n + dim(u V ) 2n i j + 2 dim(u V ) n i j + 2. Furthermore, since i + j 1 n, we have 1 n i j + 2, so we can find W U V, where dim(w ) = n i j + 2. For all unit vectors x W, we have 4.3. Low Rank Approximation. (A + B)x = Ax + Bx Ax + Bx σ i (A) + σ j (B). Lemma Given an m n complex matrix A, we have A (p,k) = A (p,k). Proof. Let A be an m n complex matrix, and let p := min(m, n). From Proposition 3.5, we have σ i (A) = σ i (A ) for i [p]. Since k p, we have ( ) 1 ( p ) 1 p A (p,k) = σ i (A ) p = σ i (A) p = A (p,k). Lemma The norm (p,k) is unitarily invariant. That is, for an m n complex matrix A, an m m unitary matrix S, and a n n unitary matrix T, we have Proof. Given A, S, and T, we have SA p (p,k) = SA (p,k) = A (p,k) = AT (p,k). = σ p i (SA) = λ i (A S SA) p 2 λ i (A A) p 2 = σ i (A) p = A p (p,k). Taking the p th root, we get SA (p,k) = A (p,k).

15 THE SINGULAR VALUE DECOMPOSITION AND LOW RANK APPROXIMATION 15 From Lemma 4.19, we know A (p,k) = A (p,k). This lets us deduce AT p (p,k) = T A p (p,k) = σ p i (T A ) = λ p i (AT T A ) = λ p i (AA ) = Taking the p th root, we get AT (p,k) = A (p,k). σ p i (A ) = A p (p,k) = A p (p,k). The strategy of using Weyl s inequality for the following proof comes from [1]. Theorem 4.21 (The Low Rank Approximation Theorem). Let A be an m n complex matrix. By the SVD, we have A = UDV for an m n rectangular diagonal matrix D and unitary matrices U and V. Let D q be the matrix obtained by setting all but the first q diagonal entries of D to zero, and let B q := UD q V. Then, B q is a closest matrix to A of rank at most q. That is, for all m n complex matrices B of rank at most q, we have A B q (p,k) A B (p,k). Proof. Let A be an m n complex matrix, and define B q as in the theorem statement. By Lemma 4.20, we have A B q (p,k) = UDV UD q V (p,k) = U(D D q )V (p,k) = D D q (p,k). The matrix D D q is an m n rectangular diagonal matrix equal to D, except that the first q diagonal entries have been zeroed out. Moreover, since the singular values of D are equal to the singular values of A, we have D D q p (p,k) = σ i (D D q ) p = i=q+1 σ i (D) p = i=q+1 σ i (A) p. By Weyl s inequality for singular values, for compatible X, Y, i, and j we have σ i+j 1 (X + Y ) σ i (X) + σ j (Y ). Let B be an m n complex matrix of rank at most q. We will make the following substitutions. j = q + 1 Y = B X = A B. This gives us σ i+q (A) σ i (A B) + σ q+1 (B). Since rk(b) q, we know σ q+1 (B) = 0, so this simplifies to σ i+q (A) σ i (A B). Now, we complete the proof. k q A B p (p,k) = k q σ i (A B) p σ i (A B) p σ i+q (A) p i=q+1 = σ i (A) p = D D q p (p,k) = A B q p (p,k). Therefore, A B q p (p,k) A B p (p,k).

16 16 MANTAS MAŽEIKA 4.4. Discussion. We have shown that B q is the closest low rank approximation to A with respect to the (p, k) norm. As was mentioned at the start of this section, this result is exactly what we would expect from a norm like the (p, k) norm, which measures largeness of the greatest singular values. Whether or not the (p, k) norm encompasses all norms for which low rank approximation holds with B q is an avenue for further exploration. Acknowledgments. It is a pleasure to thank my mentor, Tung Tho Nguyen, for helping to guide me in writing this paper, and in learning about the SVD and its applications. I am also grateful to Professor László Babai for his lectures in the UChicago Mathematics REU apprentice class, which inspired this paper, and Professor Peter May for the opportunity to participate in this REU. Bibliography Guide. As a computational courtesy to the reader, this section contains some pointers on using the references efficiently. The bullets denote the citation being referred to. [ 1 ] The edit to the original post by Algebraic Pavel is where the Weyl inequality is used. [ 2 ] The Direct Sums section number in this book is 2.3. [ 4 ] The Everything in Between section describes the (p, k) norm, and mentions the norms that it generalizes. [ 5 ] The theorem used in this paper is titled Theorem 1 by Mirsky, and can be found on the second page of the article. [ 6 ] The proof referred to in this paper is in section 2.1 of the Wikipedia article. This citation refers to a specific version of the article, so be sure to view that version. [ 8 ] This reference was useful mainly because it clued me in to using symmetric gauge functions for proving the triangle inequality for the (p, k) norm. It has a lot of other information on Schatten p-norms and Ky Fan k-norms, though, and even proves unitary invariance for the (p, k) norm in section 6.2. References [1] Algebraic Pavel ( Proof of Eckart-Young- Mirsky theorem, URL (version: ): [2] Broida, Joel G., and S. Gill Williamson. Direct Sums. Comprehensive Introduction to Linear Algebra. N.p.: n.p., n.d Print. [3] Gallier, Jean H. Singular Value Decomposition (SVD) and Polar Form. Geometric Methods and Applications: For Computer Science and Engineering. New York: Springer, SpringerLink. Web. 26 Aug [4] Johnston, Nathaniel. Ky Fan Norms, Schatten Norms, and Everything in Between. Nathaniel Johnston. N.p., 21 Aug Web. 25 Aug [5] Mirsky, L. Symmetric Gauge Functions And Unitarily Invariant Norms. The Quarterly Journal of Mathematics Q J Math 11.1 (1960): Web. [6] Projection (linear Algebra). Wikipedia. Wikimedia Foundation, 24 Aug Web. 27 Aug [7] Tao, Terence. 254A, Notes 3a: Eigenvalues and Sums of Hermitian Matrices. Whats New. N.p., 13 Jan Web. 29 Aug [8] Vasuki, Vishvas. SCRIPT OF A TALK ON THE KY FAN AND SCHATTEN P NORMS. Web. 25 Aug

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

Math 350 Fall 2011 Notes about inner product spaces In this notes we state and prove some important properties of inner product spaces. First, recall the dot product on R n : if x, y R n, say x = (x 1,...,