Singular Value Decompsition - PDF Free Download

Singular Value Decompsition Massoud Malek One of the most useful results from linear algebra, is a matrix decomposition known as the singular value decomposition It has many useful applications in almost every scientific discipline For example, it is used as an important tool for data compression To get the singular value decomposition of a matrix A, we take advantage of the fact that both A A and A A are hermitian (A t A and A A t are symmetric) with non-negative eigenvalues Hermitian and symmetric matrices have the nice property that their eigenvectors form an orthonormal basis For any m n real or complex matrix A (a ij ), there exist a factorization of the form A U Σ V called the singular value decomposition (SVD) The m m unitary matrix U consists of orthonormal eigenvectors associated with the eigenvalues of A A and the n n unitary matrix V consists of the orthonormal eigenvectors of A A The diagonal elements of Σ are called singular values of A The matrix Σ ( σ ij ) is an m n matrix, where all the entries are zero, except the diagonal entries σ ii (or just σ i ), which are the square roots of the eigenvalues of A A and/or A A Although there are many ways to choose U and V from the eigenvectors of A A and A A, but it is customary to choose these two matrices in a way that σ σ σ m > To solve a system of linear equations, where its matrix is either singular or rectangular, one must use the SVD in order to be able to find solutions, if any Truncated SVD The sizes of the matrices in the SVD are as follows : U is m m, Σ is m n, and V is n n Thus, Σ has the same size as A, while U and V are square However, if m > n, the bottom (m n) n block of Σ is zero, so that the last m n columns of U are multiplied by zero Similarly, if m < n, the rightmost m (n m) block of Σ is zero, and this multiplies the last n m rows of V This suggests a small, equivalent version of the SVD If p min(m, n), we can define U p U ( :, : p ), Σ p Σ ( : p, : p ), and V p V ( :, : p ), and write A p U p Σ p V p, where U p is m p, Σ p is p p, and V p is n p Moreover, if p r singular values are zero, we can let U r U ( :, : r ), Σ r Σ ( : r, : r ), and V r V ( :, : r ),

Linear Algebra Singular-Value Decomposition and write A r U r Σ r V r Then we have A r U r Σ r V r which is an even smaller, SVD It is often the case that the matrix A has one dimension much bigger than the other For instance, m 3 and n, might be such a case For such examples, much of the computation and memory required for the standard SVD may not actually be needed Instead, a truncated, or reduced version is appropriate ( A r U r Σ r V r ) It will be computed faster, and require less memory to store the data Definition The -norm and the Frobenius norm of an m n matrix A ( a ij ) can be easily computed from the SVD decomposition A σ ; m A F i j n a ij Theorem (Eckart-Young-Mirsky, 936) σ + σ + + σ pp p min ( m, n ) If σ σ σ m are the nonzero singular values of the m n matrix A, then for each k, the distance from A to the closest rank-k matrix B k σ k+ min B k A B k F Thus, σ k+ is a measure of how well A k approximates A In fact, for A of rank r ( k < r ), the truncated matrix A k is the best rank-k matrix approximation to A Proof For any rank k matrix B k of the same size as A, define the matrix C (c ij ) U t B k V, so B k U C V t Then A B F U Σ V t U C V t F Σ C F i σ i c ii + i j c ij i σ i c ii Thus clearly the rank k choice of C that minimizes A B F is when C is a diagonal matrix with ( σ, σ,, σ k ) as its nonzero diagonal elements The corresponding B k matches exactly the truncated matrix A k Definition The condition number associated with the linear equation A x b gives a bound on how inaccurate the solution x will be after approximation Note that this is before the effects of round-off error are taken into account; conditioning is a property of the matrix, not the algorithm or floating point accuracy of the computer used to solve the corresponding system In particular, one should think of the condition number as being (very roughly) the rate at which the solution, x, will change with respect to a change in b Thus, if the condition number is large, even a small error in b may cause a large error in x On the other hand, if the condition number is small then the error in x will not be much bigger than the error in b The condition number is defined more precisely to be the maximum ratio of the relative error in x divided by the relative error in b The condition number of a matrix is the ratio of its largest singular value to its smallest singular value If one of the σ i s is zero, or so small that its value is dominated by round-off error, then there is a problem! Definition 3 A pseudo-inverse, also known as generalized inverse, or Moore-Penrose inverse, denoted by A or A +, is a generalization of the inverse matrix All matrices, including rectangular or singular is

Linear Algebra Singular-Value Decomposition 3 matrices have unique pseudo-inverse For any m n real or complex matrix A, a pseudo-inverse of A is defined as an n m matrix A + satisfying all of the following four criteria : A A + A A ( A A + maps all column vectors of A to themselves); A + A A + A + ( A + is a weak inverse for the multiplicative semigroup); ( A A + ) A A + ( A A + is Hermitian); and ( A + A ) A + A ( A + A is also Hermitian) A common use of the pseudo-inverse is to compute a best fit (least squares) solution to a system of linear equations that lacks a unique solution Another use is to find the minimum (Euclidean) norm solution to a system of linear equations with multiple solutions The singular value decomposition can be used for computing the pseudo-inverse of a rectangular or singular matrix Indeed, the pseudo-inverse of the matrix M with singular value decomposition M U Σ V is M + V Σ + U, where Σ + is the pseudo-inverse of Σ +, which is formed by replacing every non-zero diagonal entry by its reciprocal and transposing the resulting matrix The pseudo-inverse is one way to solve linear least squares problems Theorem (Gauss and Legendre) Every linear system A x b, where A is an m n matrix, has a solution x minimizing A x b These solutions are given by the equation A t A x A t b, called the normal equations Furthermore, when the columns of A are linearly independent, it turns out that A t A is invertible, and so x is unique and given by x (A t A) A b Solving a System of Linear Equations Using Generalized Inverse Consider the following systems of linear equations : S : { x + x x 3 4 x x + x 3 and S : y + y 4 y y y + y Notice that S has more unknowns than equations, but S has more equations than unknowns The matrix equations for S and S are as follows : [ S : A X b, where A, X S : A Y b, where A [, Y y y x x x 3 [ 4, and b ;, and b To solve these two systems, first we need to find the SVD of both A and A, then from SVD, obtain A + and A+ The solutions to these two linear systems are obtained by multiplying A+ by b and A + by b 4

Linear Algebra Singular-Value Decomposition 4 Step Construct : [ B A A t 3, C A t A ; 3 [ B A A t, C A t 3 A 3 Step We have rank (A ) rank (A ) Thus the characteristic polynomials of the above matrices are as follows : K B (λ) K C (λ) (λ 4) (λ ) and K C (λ) K B (λ) (λ 4) (λ ) λ Step 3 These eigenvalues produce the matrices [ 4 Σ 4 and Σ Step 4 Next we need to find all linearly independent eigenvectors, corresponding to λ 4, λ, and λ 3, in order to construct U, V and U, V {( ) ( )} B(A A) t B(A t A ), and B(A t A ) B(A A) t,, Step 5 By normalizing the above eigenvectors, we obtain : B (A A t ) B (A t A ) {( ) (, )} Step 6 Using these two basis, we construct : U [ and B (A t A ) B (A A) t V and V U,,

Linear Algebra Singular-Value Decomposition 5 Step 7 The factorization of A and A are as follows A U Σ V t [ 4 A U Σ V t Notice that by eliminating the third column of Σ Σ and the third column of U ), we obtain [ [ [ 4 ; [ and the third row of V t (resp the third row of [ [ [ 4 A ; [ [ 4 Step 8 Now we can get the pseudo-inverses : A A V Σ U t 4 [ / / /4 /4 /4 /4 [ [ A V Σ U t [ 4 / /4 /4 / /4 /4 [ 3 Step 9 By multiplying A by b 4, we obtain /, as the solution of the linear system S / This solution has minimum Euclidean norm, among all possible solutions 4 [ Also, by multiplying A by b 3, we obtain which is the unique solution of the linear system S Least Squares Estimation Given an m n matrix A and an m-vector b, if the constant vector b is not in the range of the matrix A, then there is no vector x such that A x b So x V [ diag ( σ, σ,, σr ) U t b cannot be used to obtain an exact solution However, the vector returned will do the closest possible job in the least squares sense It will find the vector x which minimizes R A x b R is called the residual of the solution

Linear Algebra Singular-Value Decomposition 6 For example, there exists a unique solution in the case of A, b, but not if b [ t In such cases, when b is not in the range of A, then we seek to minimize the residual of the solution Finding the Solution of the Least Squares Problem A x b with Minimal Norm Consider the following linear systems : A x b : [ x [ 4 x Step Compute the SVD : [ U S V s v d ( A ) A U Σ V t x 3 [ and A y b : [ 4 4 A U Σ V t Step Choose r such that σ r Step 3 Find U r, Σ r and V r [ [ [ t is the smallest non-zero singular-value [ 4 [ [ 4 Step 4 The solution with minimal norm is t y y [ ; [ ; x V r σr Ur t b [ [ x [ 4 3 4 / ; / [ y [ [ 4 4 [ 3 4

Linear Algebra Singular-Value Decomposition 7 Notice that A x b and A x b Now, consider the system : A 3 x b 3 : [ z z Using SVD, we obtain : U, S, and V The rank of A 3 is two, so we find [ U, S, and V [ [ [ The optimal solution is z V S U t b 3 But A 3 z being able to find an exact solution is that b 3 is not in the range of A 3 The reason for not Data Compression Any image from a digital camera, scanner or in a computer is a digital image Suppose we have an 9 megapixel gray-scale image, which is a b 9 megapixels (an a b matrix) For each pixel, we have some level of black and white, given by some integer between and 55 Each of these integers (and hence each pixel) requires approximately byte to store, resulting in an approximately 86 Mb image A color image usually has three components, a red, a green, and a blue (RGB) The composite of the three RGB values creates the final color for the single pixel So storing color images involves three matrices of the same size, so it takes three times the space (58 Mb) Now suppose we have a digital image taken with a 9 megapixel camera and each color pixel is determined by a 4-bit number when we print the picture suppose we only use 8-bit colors We are still using 9 million pixels but the information used to describe the image has been reduced by one-third This is an example of image compression We will look at compressing this image through computing the singular value decomposition (SVD) The truncated SVD of the real m n matrix A with rank r is u A U r Σ r Vr t u σ σ [ v t, v t,, v t n u σ v t + u σ v t + + u m σ r vn, t u m σ r where U r is an m r matrix with orthonormal columns, σ σ σ r, and V r is an n r matrix with orthonormal rows Since σ σ σ r, the terms u i σ i v t i with small i contribute most to the sum, and hence

Linear Algebra Singular-Value Decomposition 8 contain the most information about the image Keeping only some of these terms may result in a lower image quality, but lower storage size Thus we can reduce the size of those three matrices This process is sometimes called Principal Component Analysis (PCA) Suppose we are using the SVD on an n n matrix A Then We need to store n elements for U and V and at most n elements for Σ Keeping term in the SVD, u σ v t requires only n + elements If we keep only half of the σ i s, then storing the truncated SVD and the original matrix are approximately the same Color images are in RGB, where each color is specified by to 55 This gives us three matrices The truncated SVD can be computed on all three separately or together We are also interested in finding the percentage of relative change of a rank-reduced matrix to determine by how much we may reduce without significant loss of information This is done by the formula: A A k F A F Click here to see an example of Data Compression Example Consider the 8 6 matrix A of rank 3 and the vector b : The SVD of A produces : 64 3 6 6 6 9 55 54 3 5 7 47 46 43 A 4 6 7 37 36 3 3 34 35 9 8 38 and b 4 3 44 45 9 49 5 4 5 53 8 58 59 5 4 6 3554 5585 35 6644 9 769 93 357 447 3336 95 73 989 7 4 357 57 34 965 48 47 357 37 U 3554 963 347 397 84 38 655 59 3554 578 3554 3453 345 6869 987 48 358 5 3675 77 355 75 5 757 358 3656 376 3739 797 98 4875 4558 3553 5 389 737 499 3659 376 367

Linear Algebra Singular-Value Decomposition 9 5696 7865 7579 Σ 485 4 5583 33 377 436 48 44 3983 4 3786 563 48 49 58 7 78 4 V 483 473 63 7846 5 48 46 45 4546 57 535 483 455 564 7 44 43 The fact that the rank of A is three, we may truncate the above matrices and obtain : 3554 5585 35 357 447 3336 357 57 34 5696 U 3 U ( :, : 3 ) 3554 963 347 3554 578 3554, Σ 3 7865, 7579 358 5 3675 358 3656 376 3553 5 389 485 4 5583 48 44 3983 48 49 58 V 3 V ( :, : 3 ), 483 473 63 48 46 45 483 455 564 with U 3 Σ 3 V t 3 64 3 6 6 6 9 55 54 3 5 7 47 46 43 4 6 7 37 36 3 3 34 35 9 8 38 A 4 3 44 45 9 49 5 4 5 53 8 58 59 5 4 6 By Choosing only the first two singular values of A and keeping the first two columns of U and

Linear Algebra Singular-Value Decomposition V ; ie, U U ( :, : ) and V V ( :, : ), we obtain : 3554 5585 357 447 357 57 U 3554 963 3554 578, Σ 358 5 358 3656 3553 5 485 4 ) 48 44 48 49, V, 7865 483 473 48 46 483 455 ( 5696 with U Σ V t 68895 3559 35979 6636 656 38738 9 534373 533796 3633 4 536 9456 453976 453638 9347 9389 4563 3776 765 7645 3766 3764 7756 96669 356648 3566 96783 968 356496 A 4345 785 365 43986 4366 435 5468 3388 337 584 54 34866 55 597839 59783 5769 585 59484 According to Eckart and Young theorem, A is the best rank- matrix approximation to A The Frobenius norm : A A F 7579 σ 33 The -norm A A is also 7579 norm ( A A, fro ) 7579 and norm ( A A, ) 7579 The relative error of reducing A to the rank matrix A is A A F A F 454 This means that the loss of information from A to A is only 45 % By Choosing only the first singular value of A and keeping the first columns of U and V ; ie, U U ( :, ) and V V ( :, ), we obtain : 3554 357 357 U 3554 3554, Σ 358 358 3553 ( ) 5696, V 485 48 48, 483 48 483

Linear Algebra Singular-Value Decomposition with U Σ V t 36935 36576 3668 3678 3678 36783 33475 339 337 333 337 3334 33493 3338 3389 33339 3388 33343 36879 365 3657 3674 3667 3678 3686 365 36554 3676 36654 3679 A 33549 3394 3345 33395 33344 33399 33568 33 3364 3344 33363 3347 3685 36446 36498 3665 36598 36653 According to Eckart and Young theorem, A is the best rank- matrix approximation to A The Frobenius norm : A A F 7865 σ The -norm A A is also 7865 norm ( A A, fro ) 7865 and norm ( A A, ) 7865 The relative error of reducing A to the rank one matrix A is A A F A F 4934 This means that the loss of information from A to A is about 5 % Clearly this kind of loss is not acceptable To find the solution of the least squares problem A x b with minimal norm, we use U 3, Σ 3 V 3 We have x V 3 Σ3 U3 t b 3 44378 563 5354 5354 563 44378 and If we use U, Σ and V, we obtain y V Σ U t b 3 573 59 53 549 54 535 with A y b 45 Remark Notice that the matrix A has 8 6 48 entries The total number of entries of U 3, Σ 3, and V 3 is 8 3 + 3 3 + 6 3 5 If we store these three matrices instead of the original matrix A, we only need to store three more entries, but then these matrices may be used to produce the original matrix A and to define A and A

Linear Algebra Singular-Value Decomposition Exercise Consider the 5 5 matrix A of rank four and the vector b : 3 4 5 6 7 8 9 A 3 4 5 and 5 6 7 8 9 3 3 4 5 Use MATLAB to find : The best rank-3 matrix approximation A 3 to A The best rank- matrix approximation A to A 3 The best rank- matrix approximation A to A 4 Find the loss of information for each case 5 Find the solution of the least squares problem A x b with minimal norm 6 Use A 3 to find the solution of the least squares problem A x b Also find the residual Web Searching Search engines like Google use enormous matrices of cross-references in which pages link to which other pages, and what words are on each page When you do a Google search, the higher ranks usually go to pages with your key words that have lots of links to them But there are billions of pages out there, and storing a several hundred thousands by several billion matrix is trouble, not to mention searching through it Definition 4 Latent semantic indexing (LSI) is an indexing and retrieval method that uses the singular value decomposition to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text LSI is based on the principle that words that are used in the same contexts tend to have similar meanings A key feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts Definition 5 A term-document matrix is a matrix that describes the frequency of terms that occur in a collection of documents The (i,j)-element of the matrix A (a ij ) is a positive integer p, if the keyword in position i appears p times in document j ; and is otherwise The 3 billion document vectors associated with the Web (each of which is a 3, vector) to one another to form a 3,-by-3,,, term-by-document matrix d d d 3 d 4 d 5 d 6 d 3,,, term 4 term term 3 3 5 4 3 term 3, This huge term-by-document matrix may contain redundant information This means that there are dependence among columns of the matrix The LSI method manipulates the matrix to eradicate

Linear Algebra Singular-Value Decomposition 3 dependencies and thus consider only the independent, smaller part of this large term-by-document matrix In searching, you really care about the main directions that the Web is taking So the first few singular values create a very good approximation for the enormous matrix The SVD produces A U Σ V t For each k less than the rank of A, the truncated matrix will be A k U k Σ k V t k The goal is using the truncated A k U k Σ k Vk t, to approximate A by choosing k large enough to give a good approximation to A, yet small enough so that k << r requires much less storage Typically one does not only want to know about matches, but one would like to know the most relevant documents, ie, one would like to rank the documents according to relevance The angle θ j between the j-th column of a term-by-document matrix A [ A, A,, A j,, A n and the query vector q can be computed by cos (θ j ) q t A j, ( j,,, n ) q A j Notice that when q is identical with the j-th column of A, the angle between q and A j vanishes and the cosine of this angle is one Thus, a large value of cos (θ j ) suggests that document j may be relevant Example If one has the following four (short) documents : Document {Math, Math, Calculus, SVD, Algebra} Document {Math, Linear, SVD, Abstract, Algebra } Document 3{Linear, Abstract, Algebra, Calculus, Math, SVD} Document 4{Math, Algebra, Calculus, SVD, SVD} The term-document matrix is : Document Document Document 3 Document 4 Abstract Algebra Calculus Linear Math SVD which shows which documents contain which terms and how many times they appear

Linear Algebra Singular-Value Decomposition 4 From the above table we define the matrices A, U, S, and V : 7 6485 88 74 83 4359 9 375 384 89 347 849 895 A, U, 7 6485 88 64 3847 554 86 77 77 8 764 554 86 77 77 8 764 457 646 539 4569 77 85 S, V 458 694 6596 6387 57 4459 749 539 4569 77 85 According to the diagonal entries of the matrix S, the rank of the matrix A is four Now suppose we are only interested in the two largest singular values of the matrix A ; so we are going to truncate the SVD and find A 7 6485 4359 9 347 849 U, S 7 6485 554 86 554 86 A U S V t ) 539 4569, V 458 694 646 57 4459 539 4569 ( 457 7 543 939 7 995 94 65 995 486 377 5758 486 7 543 939 7 4853 886 83 4853 4853 886 83 4853 Suppose we are searching for Linear Algebra, using the term-by-document matrix A Our query vector, then will be [ t q By taking cosine of q and each columns of A (resp A ), we obtain : Document Document Document 3 Document 4 Cosine of the angle of q and columns of A 673 635 5774 673 Cosine of the angle of q and columns of A 773 648 5897 773 Notice that both A and A k query Linear Algebra return documents two and three, the most relevant documents for our

Linear Algebra Singular-Value Decomposition 5 Exercise Consider the term-by-document matrix and the query vector : C 3 4 5 5, q Order the documents in order of relevancy