Properties of Matrices and Operations on Matrices A common data structure for statistical analysis is a rectangular array or matris. Rows represent individual observational units, or just observations, and columns represent the variables or features that are observed for each unit. If the elements of a matrix X represent numeric observations on variables in the structure of a rectangular array as indicated above, the mathematical properties of X carry useful information about the observations and about the variables themselves. In addition, mathematical operations on the matrix may be useful in discovering structure in the data. These operations include various transformations and factorizations. 1
Symmetric Matrices A matrix A with elements a ij is said to be symmetric if each element a ji has the same value as a ij. Symmetric matrices have useful properties that we will mention from time to time. Symmetric matrices provide a generalization of the inner product. If A is symmetric and x and y are conformable vectors, then the bilinear form x T Ay has the property that x T Ay = y T Ax, and hence this operation on x and y is commutative, which is one of the properties of an inner product. More generally, a bilinear form is a kernel function of the two vectors, and a symmetric matrix corresponds to a symmetric kernel. An important type of bilinear form is x T Ax, which is called a quadratic form. 2
Nonnegative Definite and Positive Definite Matrices A real symmetric matrix A such that for any real conformable vector x the quadratic form x T Ax is nonnegative, that is, such that x T Ax 0, is called a nonnegative definite matrix. We denote the fact that A is nonnegative definite by A 0. (Note that we consider the zero matrix, 0 n n, to be nonnegative definite.) If the quadratic form is strictly positive, A is called a positive definite matrix and we write A 0. 3
Systems of Linear Equations One of the most common uses of matrices is to represent a system of linear equations Ax = b. Whether or not the system has a solution (that is, whether or not for a given A and b there is an x such that Ax = b) depends on the number of linearly independent rows in A (that is, considering each row of A as being a vector). The number of linearly independent rows of a matrix, which is also the number of linearly independent columns of the matrix, is called the rank of the matrix. A matrix is said to be of full rank if its rank is equal to either its number of rows or its number of columns. 4
A square full rank matrix is called a nonsingular matrix. We call a matrix that is square but not full rank singular. The system Ax = b has a solution if and only if rank(a b) rank(a), where A b is the matrix formed from A by adjoining b as an additional column. If a solution exists, the system is said to be consistent. The common regression equations do not satisfy the condition. 5
Matrix Inverses If the system Ax = b is consistent then x = A b is a solution, where A is any matrix such that AA A = A, as we can see by substituting A b into AA Ax = Ax. Given a matrix A, a matrix A such that AA A = A is called a generalized inverse of A, and we denote it as indicated. If A is square and of full rank, the generalized inverse, which is unique, is called the inverse and is denoted by A 1. It has a stronger property: AA 1 = A 1 A = I, where I is the identity matrix. 6
To the general requirement AA A = A, we successively add three requirements that define special generalized inverses, sometimes called respectively g 2, g 3, and g 4 inverses. The general generalized inverse is sometimes called a g 1 inverse. The g 4 inverse is called the Moore-Penrose inverse. For a matrix A, a Moore-Penrose inverse, denoted by A +, is a matrix that has four properties. 7
1. AA + A = A. Any matrix that satisfies this condition is called a generalized inverse, and as we have seen above is denoted by A. For many applications, this is the only condition necessary. Such a matrix is also called a g 1 inverse, an inner pseudoinverse, or a conditional inverse. 2. A + AA + = A +. A matrix A + that satisfies this condition is called an outer pseudoinverse. A g 1 inverse that also satisfies this condition is called a g 2 inverse or reflexive generalized inverse, and is denoted by A. 3. A + A is symmetric. 4. AA + is symmetric. 8
The Matrix X T X When numerical data are stored in the usual way in a matrix X, the matrix X T X often plays an important role in statistical analysis. A matrix of this form is called a Gramian matrix, and it has some interesting properties. First of all, we note that X T X is symmetric; that is, the (ij) th element, k x k,i x k,j is the same as the (ji) th element. Secondly, because for any y, (Xy) T Xy 0, X T X is nonnegative definite. Next we note that X T X = 0 X = 0. 9
The generalized inverses of X T X have useful properties. First, we see from the definition, for any generalized inverse (X T X), that ((X T X) ) T is also a generalized inverse of X T X. (Note that (X T X) is not necessarily symmetric.) Also, we have X(X T X) X T X = X. This means that (X T X) X T is a generalized inverse of X. The Moore-Penrose inverse of X has an interesting relationship with a generalized inverse of X T X: XX + = X(X T X) X T. 10
An important property of X(X T X) X T is its invariance to the choice of the generalized inverse of X T X. The matrix X(X T X) X T has a number of other interesting properties in addition to those mentioned above. ( X(X T X) X T) ( X(X T X) X T) = X(X T X) (X T X)(X T X) X T that is, X(X T X) X T is idempotent. = X(X T X) X T, It is clear that the only idempotent matrix that is of full rank is the identity I. 11
Any real symmetric idempotent matrix is a projection matrix. The most familiar application of the matrix X(X T X) X T is in the analysis of the linear regression model y = Xβ + ɛ. This matrix projects the observed vector y onto a lower-dimensional subspace that represents the fitted model: ŷ = X(X T X) X T y. Projection matrices, as the name implies, generally transform or project a vector onto a lower-dimensional subspace. 12
Eigenvalues and Eigenvectors Multiplication of a given vector by a square matrix may result in a scalar multiple of the vector. If A is an n n matrix, v is a vector not equal to 0, and c is a scalar such that Av = cv, we say v is an eigenvector of A and c is an eigenvalue of A. We should note how remarkable the relationship Av = cv is: The effect of a matrix multiplication of an eigenvector is the same as a scalar multiplication of the eigenvector. The eigenvector is an invariant of the transformation in the sense that its direction does not change under the matrix multiplication transformation. 13
Eigenvalues and Eigenvectors We immediately see that if an eigenvalue of a matrix A is 0, then A must be singular. We also note that if v is an eigenvector of A, and t is any nonzero scalar, tv is also an eigenvector of A. Hence, we can normalize eigenvectors, and we often do. If A is symmetric there are several useful facts about its eigenvalues and eigenvectors. The eigenvalues and eigenvector of a (real) symmetric matrix are all real. 14
Eigenvalues and Eigenvectors The eigenvectors of a symmetric matrix are (or can be chosen to be) mutually orthogonal. We can therefore represent a symmetric matrix A as A = V CV T, where V is an orthogonal matrix whose columns are the eigenvectors of A and C is a diagonal matrix whose (ii) th element is the eigenvalue corresponding to the eigenvector in the i th column of V. This is called the diagonal factorization of A. 15
Eigenvalues and Eigenvectors If A is a nonnegative (positive) definite matrix, and c is an eigenvalue with corresponding eigenvector v, if we multiply both sides of the equation Av = cv, we have v T Av = cv T v 0(> 0), and since v T v > 0, we have c 0(> 0). The maximum modulus of any eigenvalue in a given matrix is of interest. This value is called the spectral radius, and for the matrix A, is denoted by ρ(a): ρ(a) = max c i, where the c i s are the eigenvalues of A. The spectral radius is very important in many applications, from both computational and statistical standpoints. The convergence of some iterative algorithms, for example, depend on bounds on the spectral radius. 16
Matrix Decomposition Computations with matrices are often facilitated by first decomposing the matrix into multiplicative factors that are easier to work with computationally, or else reveal some important characteristics of the matrix. Some decompositions exist only for special types of matrices, such as symmetric matrices or positive definite matrices. 17
The Singular Value Decomposition One of most useful decompositions, and one that applies to all types of matrices, is the singular value decomposition. An n m matrix A can be factored as A = UDV T, where U is an n n orthogonal matrix, V is an m m orthogonal matrix, and D is an n m diagonal matrix with nonnegative entries. The number of positive entries in D is the same as the rank of A. This factorization is called the singular value decomposition (SVD) or the canonical singular value factorization of A. 18
Singular Values and the Singular Value Decomposition The elements on the diagonal of D, d i, are called the singular values of A. We can rearrange the entries in D so that d 1 d 2, and by rearranging the columns of U correspondingly, nothing is changed. If the rank of the matrix is r, we have d 1 d r > 0, and if r < min(n, m), then d r+1 = = d min(n,m) = 0. In this case D = where D r = diag(d 1,..., d r ). [ Dr 0 0 0 ], From the factorization defining the singular values, we see that the singular values of A T are the same as those of A. 19
Singular Values and the Singular Value Decomposition For a matrix with more rows than columns, in an alternate definition of the singular value decomposition, the matrix U is n m with orthogonal columns, and D is an m m diagonal matrix with nonnegative entries. Likewise, for a matrix with more columns than rows, the singular value decomposition can be defined as above but with the matrix V being m n with orthogonal columns and D being m m and diagonal with nonnegative entries. If A is symmetric its singular values are the absolute values of its eigenvalues. 20
SVD and the Moore-Penrose Inverse The Moore-Penrose inverse of a matrix has a simple relationship to its SVD. If the SVD of A is given by UDV T, then its Moore-Penrose inverse is A + = V D + U T, as is easy to verify. The Moore-Penrose inverse of D is just the matrix D + formed by inverting all of the positive entries of D and leaving the other entries unchanged. 21
Square Root Factorization of a Nonnegative Definite Matrix If A is a nonnegative definite matrix (which, for me, means that it is symmetric), its eigenvalues are nonnegative, so we can write S = C 1 2, where S is a diagonal matrix whose elements are the square roots of the elements in the C matrix in the diagonal factorization of A. Now we observe that (V SV T ) 2 = V CV T = A; hence, we write and we have (A 1 2) 2 = A. A 1 2 = V SV T, 22