Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and j the columns. A denotes the transose of A. If B = A, then b ij = a ji. A vector x is a column vector, and x is a row vector.
Square Matrices The main diagonal of a square matrix A consists of elements a ii. Sub-diagonal elements are those below the main diagonal (or a ij such that i > j). Suer-diagonal elements are all a ij such that i < j. A is symmetric if a ij = a ji for all i and j. An uer triangular matrix has all sub-diagonal elements = 0. A lower triangular matrix has all suer-diagonal elements = 0. A diagonal matrix has all elements equal to 0 excet for the elements on the main diagonal. An identity matrix is a diagonal matrix I with 1 s on the main diagonal. A symmetric matrix A is ositive definite if x Ax > 0 for all x 0.
Matrix Oerations in R In R, a matrix is an array with two subscrits. However, there are utilities in R that aly only to matrices be careful about using the aroriate object tye when you want to work with matrices. > # create a matrix: > A=cbind(c(1,,3),c(4,5,6),c(7,8,9)) (1 (4 (7 > A [,1] [,] [,3] [1,] 1 4 7 [,] 5 8 [3,] 3 6 9 > is.matrix(a) [1] TRUE > # alternative ti way: > A=matrix(c(1:9),nrow=3,ncol=3) > A [,1] [,] [,3] [1,] 1 4 7 [,] 5 8 [3,] 3 6 9
More Matrix Oerations in R > # get the transose: > t(a) [,1] [,] [,3] [1,] 1 3 [,] 4 5 6 [3,] 7 8 9 > # multily l matrices: > A%*%c(1,0,0) [,1] [1,] 1 [,] [3,] 3 > c(1,0,0)%*%a [,1] [,] [,3] [1,] 1 4 7 > cbind(c(1,0,0))%*%a Error in cbind(c(1, 0, 0)) %*% A : non-conformable arguments > t(cbind(c(1,0,0)))%*%a [,1] [,] [,3] [1,] 1 4 7
More Matrix Oerations in R > # addition/subtraction > A-matrix(,nrow=3,ncol=3) [,1] [,] [,3] [1,] -1 5 [,] 0 3 6 [3,] 1 4 7 > # element-wise -- as oosed to matrix -- multilication li > A*matrix(,nrow=3,ncol=3) [,1] [,] [,3] [1,] 8 14 [,] 4 10 16 [3,] 6 1 18 > A%*%matrix(,nrow=3,ncol=3) [,1] [,] [,3] [1,] 4 4 4 [,] 30 30 30 [3,] 36 36 36
More Matrix Oerations in R > # symmetric matrices and eigenvalues: > > S=cbind(c(1,,3),c(,1,),c(3,,1)) (1 ( 1 ) (3 1)) > S [,1] [,] [,3] [1,] 1 3 [,] 1 [3,] 3 1 > eigen(s) $values [1] 5.701561-0.701561 -.0000000 $vectors [,1] [,] [,3] [1,] 0.605918 0.364519 7.071068e-01 01 [,] 0.5154991-0.8568901.75740e-16 [3,] 0.605918 0.364519-7.071068e-01 >
Available Libraries in R See R_ext/Linack.h for details about the BLAS, LINPACK, and EINPACK libraries of FORTRAN subroutines. For some descrition of these three libraries, see htt://www.netlib.org/laack. F Wii RE i Th d ll From Writing R Extensions: These are exressed as calls to FORTRAN subroutines, and they will also be usable from users' FORTRAN code.
Solving Systems of Equations Consider solving the system Ax b for x, given A and b. In scalar terms, this involves solving j1 a ij x j b, i for x 1,,x. It s generally better to calculate A -1 b as the solution to Ax = b than to calculate A -1 directly and multily.
solve() in R > A=cbind(c(,1,),c(8,3,7),c(3,,4)) > A [,1] [,] [,3] [1,] 8 3 [,] 1 3 [3,] 7 4 > b=cbind(c(,5,8)) bi ( > # the solve() function actually uses the QR factorization: > x=solve(a,b) > x [,1] [1,] 3 [,] - [3,] 4 > A%*%x [,1] [1,] [,] 5 [3,] 8
solve() in R, continued > # to obtain the inverse of A: > Ai=solve(A) > Ai [,1] [,] [,3] [1,] 11-7 [,] 0-1 [3,] -1 - > # getting x directly, with the inverse of A (not a good idea): > x=ai%*%b > x [,1] [1,] 3 [,] - [3,] 4 >
Mathematics and Statistics Uer- and Lower-Triangular Matrices Uer and Lower Triangular Matrices For an uer triangular A, the system Ax = b can be written as: b x a x a x a 1 1 1 1 11 b x a x a 1 1 1 1 11 b x a This can be readily solved with backward substitution, starting with the last equation: 1 1, 1, 1 1 / ) ( / a x a b x a b x 11 1 1 1 1 / ) ( a x a x a b x There is an analogous forward substitution algorithm for lower triangular systems.
Triangular Matrices in R > A [,1] [,] [,3] [1,] 1 3 [,] 0 1 1 [3,] 0 0 > b=c(8,4,) > x=backsolve(a,b) > x [1] -1 3 1
Gaussian Elimination Recall that Gaussian elimination (GE) involves augmenting the matrix A with an additional column containing b, followed by these stes: 1) We first reduce the A ortion of this matrix to uer triangular form using elementary row oerations. ) We next work in reverse, starting from the last row and working our way u, reducing the A ortion to an identity matrix. What remains in the last column is the solution x. Try this for the system in the R solve() examle.
The LU Decomosition Note that Gaussian elimination can be viewed simly as the factoring of A into the roduct of a lower triangular matrix L and an uer triangular matrix U. The matrix U is the matrix left in the A ortion of the augmented matrix used for GE when A is reduced to uer triangular form. The sub-diagonal elements of the matrix L reresent the multiliers used at each stage of GE. The diagonal elements of L are all 1 s.
Comuting the LU Decomosition Exlicit formulas for the elements of U and L are given by u ij a ij i1 k 1 l ik u kj, i 1,..., j; l ij 1 u jj j1 a l u, i j 1,...,. ij k 1 ik kj Once L and U are comuted, we solve Ax = LUx = b first by using a forward substitution to solve for y in Ly = b, and then using a backward substitution to solve for y in Ux = y.
Advantages of the LU Aroach No additional comutations are needed (beyond what s required for GE). Solutions for any right-hand vector b can be comuted without redoing the GE; b is not needed when A is factored. LU yields other useful quantities; e.g., det(a) is the roduct of the diagonal elements of U, and each of the columns of A -1 can be comuted by taking b to be the corresonding column of the x identity matrix.
Vector Norms Vector and matrix norms lay an imortant role in error analysis. A norm tyically y measures in some sense the magnitude of an argument. For a real number, this is ordinarily the absolute value. For a vector x = (x 1,x,,x ), three common choices are the 1-norm, or L 1, defined by x 1 i11 the -norm, or L, defined by x i, x 1/ x, i 1 i and the -norm, or L, defined by x max xi i.
Matrix Norms To generalize these norms to matrices, a useful (but not unique) method is to define corresonding matrix norms from the vector norms through This yields A su Ax / x, for j 1,,,. j x0 j A 1 max aij, j j i A max aij A' 1, i and a value of A that is equal to the largest singular value of A. j
Condition Numbers The condition number of a square matrix A is defined to be ( A) A which is comuted as if A is singular. j j A 1 j, Some remarks: The lower bound of the condition number is 1. This yields a useful measure of how close a matrix is to singularity When solving a system Ax = b, it turns out that the relative error of the comuted solution is roortional to κ j (A).
Matrices and Linear Regression In statistical alications, we often run into roblems of the form y i x ij 1 j j i, where the y i are the resonses, the x ij are the covariates, the β j are the regression coefficients, and the ε i reresent the error terms. If the ε i can be assumed to be indeendent variables with 0 mean and a variance of σ, then we often use the least squares estimators for the β j.
The Least Squares Aroach The least square solution is the vector β = (β 1,, β ) that minimizes y X ( y X )'( y X ), where y = (y 1,, y n ), and X = (x ij ) with x i1 = 1 for all i (if an intercet term is included). Note that the solution ˆ ( ˆ ˆ 1,..., )' gives the vector of fitted values yˆ Xˆ X that is closest (in the Euclidean norm) to the actual resonses.
The Least Squares Solution An obvious way of obtaining the solution is to set the gradient y X 0, obtaining i the normal equations X ' X X ' y. This system can be solved using the methods described reviously. In articular, since X X is ositive definite it for full rank X, then the Choleski decomosition (a secial case of LU factorization for ositive definite matrices) can be very efficient.
Comutational Considerations We often want a variety of different models fit (e.g., stewise regression), so it d be good to have a fast method for udating the fitted model when covariates are added or droed. Along with the solution, we may also want other quantities such as Residuals Fitted values Regression and error sums of squares Diagnostic measures (e.g., diagonal elements of the rojection oerator X(X X) -1 X the so-called hat matrix )
Other Otions With the ractical considerations outlined on the slide revious, two very efficient techniques the QR decomosition and the singular value decomosition (SVD) involve decomosition of X directly. Advantages: It turns out that factoring X directly is a better conditioned roblem than factoring X X. QR or SVD allows us more easily to add and subtract covariates directly without a lot more additional work.
Rotations and Orthogonal Matrices A rotation in R is a linear transformation Q: R R such that Qx x, for all x in R. A rotation does not affect the length of vectors, but changes their orientation it can be thought of as a change in the coordinate axes, without a change in vector length.
Proerties of a Rotation Q From Qx = x for all x, it follows that x Q Qx = x x, so that x Q Q I x = 0 for all x. This is only true if Q Q = I, since Q Q I is symmetric. For square matrices only, Q Q = I imlies QQ = I. So Q = Q -1, and Q must be of full rank. Therefore, any x in R can be reresented by Qy for some y in R. When Q Q = I, the columns of Q are mutually orthogonal and each has unit length. For square matrices, either of these imlies the other a square matrix satisfying these roerties is said to be orthogonal. If Q is a rotation, then Q = 1. Since Q -1 = Q is also a rotation, then Q -1 is also 1, so that κ (Q) = 1. If Q 1 and Q are orthogonal matrices, then (Q 1 Q ) (Q 1 Q ) = Q Q 1 Q 1 Q = Q Q = I, so Q 1 Q is also orthogonal. Because of these characteristics any rotation is given by an orthogonal matrix Because of these characteristics, any rotation is given by an orthogonal matrix, and vice-versa.
Householder Transformations There are various ways of obtaining a rotation, such as a lane rotation ti (e.g., Jacobi or Givens rotations). ti Another family of rotations is referred to as the Householder transformations. It s of the form H I uu ', u' u where I is the identity matrix and u is any vector (of the roer length). By convention, H = I when u = 0. An imortant alication of Householder transformations is to transform matrices to uer triangular form.
Householder for a Single Vector Let x be an n-dimensional vector, and define u by u i 0, 0 i t, xt s, i t, x i, t i n, with s ) n 1/. j t j sign( x t x Then it can be shown that Hx = x u(u x)/(u u)=x u) x u, so that (Hx) i = x i for i < t, (Hx) i = 0 for i > t, and (Hx) t = s. (The sign of s is chosen so that x t and s will have the same sign.) Thus, the last n t comonents have been set to zero in the transformation Hx.
Householder for a Matrix We can erform a series of such transformations on the columns of a matrix in such a way as to leave the transformed matrix in uer triangular form. The transformation ti described d on the revious slide for x alied to another vector y yields Hy y u ( u ' y ) /( u ' u ), So that the first t 1 comonents of Hy are the same as y, and the other comonents are of the form y i fu i, where f y u / u. jt j j jt j
QR and Least Squares Recall the roblem of obtaining the least squares solution to y = Xβ. The motivation for the QR decomosition is that for any n x n orthogonal matrix Q Q' y Q' X y X, so that a β minimizing the former will also minimize the latter. Suose that we can find a Q such that Q' X 0 R ( n ), where R is uer triangular and 0 is a matrix of zeroes.
QR and Least Squares, continued Partition the Q described on the revious slide into Q = (Q 1, Q ), with Q 1 containing the first columns of Q and Q containing the other columns. Then Q' y Q' X Q 1' y R Q1 ' y R Q ' y Q ' y, so that Q 1 Rβ is minimized by ˆ R which is the least squares solution. 1 Q 1 ' y, 1
Obtaining Q for Least Squares We can obtain the transformation Q for X using the roduct of Householder transformations. For examle, if X j reresents the jth column of X, then one way of finding Q requires these stes: Let H 1 be the Householder transformation described reviously with x = X 1 and t = 1. Let X (1) X.ThenX(1) j be the jth column of H 1 X 1 has all elements excet for the first equal to 0. Next, let H be the Householder transformation with x = X (1) and t =, and let X () X.ThenX() j be the jth column of H H 1 X has all elements excet ossibly the first two equal to 0. Also, X () 1 = X (1) 1 ; that is, H did not change the first column, so now the first two columns of H H 1 X are in uer triangular form. Continuing, at the kth stage (k = 3,,)letH H k be the Householder transformation with x = X (k-1) k and t = k, and let X (k) j be the jth column of H k H 1 X. Then X (k) j = X (k-1) j for j < k, and the first k columns of the resulting matrix are in uer triangular form. After the th ste, the matrix H H 1 X has the form of Q X defined two slides revious.
Least Squares Quantities and QR To obtain the least squares estimates, we need Q 1 y, which can be comuted by alying the Householder transformations to y either during or after they are comuted for X. Then solve the uer triangular system Rβ = Q 1 y. (Note that once we ve comuted Q for X, we can aly it using different y s.) The error variance is given by ˆ y X /( n ) Q' y Q' X /( n ) /( n Recall that the diagonal elements of the hat matrix H = X (X X) X) -1 X are called the leverage values they rovide a diagnostic for identifying influential observations, or observations that have a relatively large effect on the estimates of the regression coefficients. Note that the ith diagonal element of H is given by h ii = x i (X X) 1 x i, where x i is the covariate vector of the ith observation. Since X = Q 1 R, then X X = R Q 1 Q 1 R = R R (note that Q 1 Q 1 = I x, but Q 1 Q 1 is not an identity matrix). Hence x i (X X)( ) -1 x i = x i (R R)( ) -1 x i = (R ) 1 x i. Q ' y ).
Singular Value Decomosition (SVD) This is regarded as the most stable means of solving linear systems. The SVD has the form X UDV ', Where X is an n x matrix with n >, U nx has orthonormal columns, D x is diagonal with d ii > 0, and V x is orthogonal. The d ii are called the singular values of X. Assume that d 11 d. (Note: this isn t the only form of the SVD. Another involves an orthogonal U nxn, and D nx where the n rows of zeroes are aended to the D defined above.)
Some Proerties of the SVD Since the columns of U are orthonormal, then U U = I x (although UU I nxn, unless n = ). Since X X = VDU UDV = VD V, then it follows that the columns of V are eigenvectors of X X and that the d ii are the corresonding eigenvalues. If X is a square x nonsingular matrix, then both U and V are orthogonal matrices, and X -1 = (V ) -1 D -1 U -1 = VD -1 U. So once the SVD is comuted, inverting the matrix X really only requires inverting a diagonal matrix. For a general n x matrix X,, with SVD UDV,, rank(x) = the number of nonzero d ii. A generalized inverse of X is any matrix G satisfying XGX = X. Let D + be the diagonal matrix with elements d ii+ = 1/d ii, if d ii > 0, and d ii+ = 0 if d ii = 0. Then a articular generalized inverse for X is given by X VD U '. This articular inverse is called the Moore-Penrose generalized inverse.
Comuting the SVD is somewhat comlicated. It involves finding orthogonal matrices U e and V such that the uer x block of U e XV is a diagonal matrix, with the rest of the matrix consisting of zeroes. We must roceed with alternating rows and columns, building Householder transformations U h XV h = B, where B is in bidiagonal form with nonzero elements b ii and b i,i+1, i = 1,,. We then use an iterative algorithm to find the singular values and transformations U b and V b such that U b B V b = D. Details are in Numerical Recies (either for C or Fortran) by Press et al.
SVD and Least Squares For our same least squares roblem, if rank(x) = and UDV is the SVD of X, then X X X= = VD V. The least squares solution then is ˆ 1 1 ( X ' X ) X ' y VD V ' VDU ' y VD U ' y. Once we have the SVD, finding the least squares solution involves alying the orthogonal transformations used for U to y, and inverting the diagonal matrix D, along with some additional matrix multilication.