APPENDIX B EIGENVALUES AND SINGULAR VALUE DECOMPOSITION B.1 LINEAR EQUATIONS AND INVERSES Problems of linear estimation can be written in terms of a linear matrix equation whose solution provides the required parameter values of the estimate. The solution of these equations depends on the concept of an inverse. However, depending on the problem, there may exist no solution (no inverse), a unique solution (unique inverse), or many (infinite numbered) solutions (many inverses). Here, we detail the conditions causing these three outcomes. For an equation with a unique solution, we describe the approach for obtaining the solution. For equations with no solution, we describe the approach for obtaining an approximate solution. For problems with many possible solutions, we describe an approach used to choose a particular solution. Consider the linear matrix equation ðb:1þ where X is an n d matrix and y is a column vector of length n. The goal is to determine the vector w that satisfies (B.1). Typically, X and y are created using the data and w is interpreted as a vector of parameters. The existence of solutions or uniqueness of solutions depends on the rank of the matrix X. The rank r of a matrix is the number of linearly independent rows in a matrix. This is equivalent to the number of linearly independent columns of the matrix. The values of r, n, and d indicate the types of solutions. Learning From Data: Concepts, Theory, and Methods, Second Edition By Vladimir Cherkassky and Filip Mulier Copyright # 2007 John Wiley & Sons, Inc. 514
LINEAR EQUATIONS AND INVERSES 515 First, under conditions n d and r ¼ n, at least one solution exists. There could, however, be multiple (infinite numbered) solutions. In this case, there exists at least one right inverse C, such that XC ¼ I n, where I n is the n n identity matrix. For a given right inverse C, solutions are determined in the following manner: Xw ¼ XCy; ðb:2þ w ¼ Cy : There may be many right inverses, as the possibility exists for many solutions. One right inverse that always guarantees a solution is C ¼ X T ðxx T Þ 1 : ðb:3þ This particular right inverse always provides the solution w, which has the minimum norm, out of all possible solutions. This type of inverse that imposes additional constraints in order to provide a unique solution is often called a pseudoinverse. Second, under conditions n d and r ¼ d, either a unique solution exists or no solution exists. If a unique solution exists, a left inverse B exists such that BX ¼ I d, where I d is the d d identity matrix. If a unique solution exists, it is determined in the following manner: BXw ¼ By; w ¼ By: ðb:4þ It is also possible that no solution exists. In this case, we may want to find an approximate solution, in a least-squares sense, minimizing k Zw y k 2 : ðb:5þ The solution to the normal equation provides the unique minimum solution for the least-squares problem X T Xw ¼ X T y: ðb:6þ The solution in both cases (exact or approximate via normal equation) is provided by the left inverse B ¼ðX T XÞ 1 X T : ðb:7þ Note that an exact solution would provide a minimum least-squares solution, with minimum value of zero. A square matrix (n ¼ d), where r ¼ n ¼ d, is capable of satisfying both conditions. That means there always exists a single unique solution to the linear equation (B.1). The matrix X has identical right and left inverses, denoted as X 1. Notice
516 EIGENVALUES AND SINGULAR VALUE DECOMPOSITION that the definitions of right and left inverses (B.3) and (B.7) require inversion of the square matrices XX T or X T X. These inverses exist under either of these conditions. For many practical linear estimation problems with matrix X, n > d, the columns of X may not be linearly independent. In this case, the second condition is not satisfied, as r < d. This would result in multiple solutions minimizing (B.5). Using the singular value decomposition (SVD), it is possible to develop a left pseudoinverse that imposes additional constraints to provide a unique solution. This is discussed further in Section B.3. B.2 EIGENVECTORS AND EIGENVALUES The eigenvectors of a matrix define directions in which the effect of a matrix is particularly simple. For a vector in one of these directions, multiplication with the matrix only scales the vector; its direction is left unchanged. The scaling factor is given by the eigenvalue. The eigenvalues are used in Section 7.2.3 for determining the effective degrees of freedom of a linear estimator. The eigenvectors and eigenvalues satisfy the equation Au ¼ yu; ðb:8þ where A is a square matrix of size n n, u is one of the n eigenvectors, and y is the corresponding eigenvalue. Equation (B.8) is equivalent to ða yiþu ¼ 0: ðb:9þ This equation has a nonzero solution u if and only if its matrix is singular. Therefore, the eigenvalues of A satisfy detða yiþ ¼0: ðb:10þ Equation (B.10) is a polynomial of degree n in y and is called the characteristic polynomial of A. Its roots are the eigenvalues of A. In general, the eigenvalues of a matrix do not necessarily have unique values. Also, eigenvectors and corresponding eigenvalues can be scaled arbitrarily. Therefore, eigenvectors are usually normalized to unit length. For matrices that are symmetric (A ¼ A T ), the eigenvectors are orthogonal and the eigenvalues are real. If A is a symmetric matrix, it is possible to use the eigenvectors to diagonalize A: U T AU ¼ D ¼ 2 y 1 6. 4.. y n 3 7 5; ðb:11þ
SINGULAR VALUE DECOMPOSITION 517 where the matrix D is constructed by placing the eigenvectors on the diagonal, and columns of U consist of the eigenvectors. As the eigenvectors are orthogonal, the following identity applies: U T U ¼ UU T ¼ I: ðb:12þ This allows us to rewrite Eq. (B.11) as the eigen decomposition A ¼ UDU T : ðb:13þ The sum of the diagonal entries of A is defined as the trace of A. It equals the sum of the eigenvalues. Some additional useful properties of the eigen decomposition are Inverse A 1 ¼ UD 1 U T : ðb:14þ Powers of A A k ¼ UD k U T : ðb:15þ B.3 SINGULAR VALUE DECOMPOSITION The SVD is a decomposition similar to the eigen decomposition that applies for rectangular matrices. The singular values of a matrix describe its scaling effect. The decomposition is in terms of two (different) orthogonal matrices and a diagonal one. Applications of the SVD include computing the generalized inverse used in Chapter 5 and determining the principal components in Chapter 6. Let X be a rectangular matrix of size n d. The SVD of X is X ¼ UV T ; ðb:16þ where U is an n n orthogonal matrix, V is a d d orthogonal matrix, and is an n d matrix with singular values filling the first r places on the diagonal s ij ¼ s i 0; i ¼ j and i r; ðb:17þ 0; i 6¼ j or i > r: The value r is the rank of X. The SVD is related to the eigen decomposition in the following way: 1. The columns of U are the eigenvectors of XX T 2. The columns of V are the eigenvectors of X T X 3. The singular values on the diagonal of are the square roots of the eigenvalues of both XX T and X T X. (The products XX T and X T X have the same nonzero eigenvalues.)
518 EIGENVALUES AND SINGULAR VALUE DECOMPOSITION The SVD provides a stable solution to least-squares problems. The least-squares solution for estimating w in the rectangular system Xw ffi y; ðb:18þ where n > d, is given by the solving the normal equation X T Xw ffi X T y: ðb:19þ The solution to the normal equation is provided by the left inverse w ¼ðX T XÞ 1 X T y: ðb:20þ There are two possible difficulties with solving (B.18) exactly: 1. The rows of X may be linearly dependent (i.e., no solution may exist) 2. The columns of X may be linearly dependent (i.e., no unique solution may exist) Note that computational procedures for solving (B.18) may provide unstable solutions before conditions 1 and 2 are met exactly. As discussed in Section B.1, the solution (B.20) provided by the normal equation is designed to provide an approximate solution to (B.11) even if the first difficulty occurs. However, if the columns of X are linearly dependent, the normal equations do not provide a unique solution. A unique solution w þ exists if we apply the additional constraint on possible solutions w. The solution w þ is the one with minimum (L 2 ) norm k w k. This unique solution to the normal equations is provided by using the left pseudoinverse of X, denoted as X þ. The left pseudoinverse is defined in terms of the SVD of X in the following manner: X þ ¼ V þ U T ; ðb:21þ where the reciprocals of the singular values, 1=s i are on the diagonal of þ.the least-squares solution to (B.18) with minimum L 2 norm is given by w þ ¼ X þ y: ðb:22þ Note that the left pseudoinverse always exists regardless of whether the matrix is square or of full rank. This inverse provides a suitable generalization for the regular matrix inverse. The left pseudoinverse X þ is the same as the regular matrix inverse X 1 if X is square and nonsingular. In addition, if the normal equations do provide a unique solution, this solution is also provided using the left pseudoinverse. For these reasons, the left pseudoinverse is often called the generalized inverse. It provides a general-purpose procedure for solving Eq. (B.18) when n d.