Numerical Linear Algebra

Similar documents
Numerical Linear Algebra

Participation Factors. However, it does not give the influence of each state on the mode.

Introduction to MVC. least common denominator of all non-identical-zero minors of all order of G(s). Example: The minor of order 2: 1 2 ( s 1)

Chater Matrix Norms and Singular Value Decomosition Introduction In this lecture, we introduce the notion of a norm for matrices The singular value de

Use of Transformations and the Repeated Statement in PROC GLM in SAS Ed Stanek

Numerical Linear Algebra

Chapter 10. Supplemental Text Material

MATH 2710: NOTES FOR ANALYSIS

Moment Generating Function. STAT/MTHE 353: 5 Moment Generating Functions and Multivariate Normal Distribution

Matrix Factorizations

Linear Algebra and Matrix Inversion

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

Introduction to Group Theory Note 1

2.1 Gaussian Elimination

Properties of Matrices and Operations on Matrices

Linear algebra for computational statistics

1/25/2018 LINEAR INDEPENDENCE LINEAR INDEPENDENCE LINEAR INDEPENDENCE LINEAR INDEPENDENCE

Fundamentals of Engineering Analysis (650163)

Review of Linear Algebra

Sets of Real Numbers

8.7 Associated and Non-associated Flow Rules

Cheat Sheet for MATH461

14 Singular Value Decomposition

This can be accomplished by left matrix multiplication as follows: I

Uncorrelated Multilinear Principal Component Analysis for Unsupervised Multilinear Subspace Learning

Numerical Linear Algebra Homework Assignment - Week 2

Numerical Linear Algebra SEAS Matlab Tutorial 2

Applied Linear Algebra in Geoscience Using MATLAB

Numerical Methods - Numerical Linear Algebra

G1110 & 852G1 Numerical Linear Algebra

Basic Concepts in Matrix Algebra

Direct Methods for Solving Linear Systems. Matrix Factorization

CS 246 Review of Linear Algebra 01/17/19

Lecture 6: Geometry of OLS Estimation of Linear Regession

ε i (E j )=δj i = 0, if i j, form a basis for V, called the dual basis to (E i ). Therefore, dim V =dim V.

MATH2210 Notebook 2 Spring 2018

Lecture 1.2 Pose in 2D and 3D. Thomas Opsahl

Linear Algebra Review. Vectors

Gaussian Elimination and Back Substitution

Orthonormal Transformations and Least Squares

CMSC 425: Lecture 4 Geometry and Geometric Programming

Radial Basis Function Networks: Algorithms

CHAPTER 6. Direct Methods for Solving Linear Systems

Matrices and systems of linear equations

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

lecture 2 and 3: algorithms for linear algebra

4. Score normalization technical details We now discuss the technical details of the score normalization method.

COURSE Numerical methods for solving linear systems. Practical solving of many problems eventually leads to solving linear systems.

MATRICES AND MATRIX OPERATIONS

Finite Mixture EFA in Mplus

Linear Algebra and Matrices

GAUSSIAN ELIMINATION AND LU DECOMPOSITION (SUPPLEMENT FOR MA511)

Preface to Second Edition... vii. Preface to First Edition...

216 S. Chandrasearan and I.C.F. Isen Our results dier from those of Sun [14] in two asects: we assume that comuted eigenvalues or singular values are

Chapter 13 Variable Selection and Model Building

Linear Algebra. Solving Linear Systems. Copyright 2005, W.R. Winfrey

5.6. PSEUDOINVERSES 101. A H w.

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

ECE 6960: Adv. Random Processes & Applications Lecture Notes, Fall 2010

. a m1 a mn. a 1 a 2 a = a n

Matrix decompositions

MTH 464: Computational Linear Algebra

Econ Slides from Lecture 7

Statics and dynamics: some elementary concepts

0.6 Factoring 73. As always, the reader is encouraged to multiply out (3

Numerical Linear Algebra

15 Singular Value Decomposition

Orthonormal Transformations

Eigenvalues and diagonalization

Pretest (Optional) Use as an additional pacing tool to guide instruction. August 21

Feedback-error control

Matrix Algebra: Summary

The analysis and representation of random signals

Linear Algebra (Review) Volker Tresp 2018

Review of Linear Algebra

Computational Linear Algebra

Image Registration Lecture 2: Vectors and Matrices

The Solution of Linear Systems AX = B

Roundoff Error. Monday, August 29, 11

Matrix Algebra for Engineers Jeffrey R. Chasnov

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

EE731 Lecture Notes: Matrix Computations for Signal Processing

Hidden Predictors: A Factor Analysis Primer

Linear Algebra. James Je Heon Kim

New weighing matrices and orthogonal designs constructed using two sequences with zero autocorrelation function - a review

1 Multiply Eq. E i by λ 0: (λe i ) (E i ) 2 Multiply Eq. E j by λ and add to Eq. E i : (E i + λe j ) (E i )

Practical Linear Algebra: A Geometry Toolbox

9. Numerical linear algebra background

Numerical Linear Algebra

Multivariate Statistical Analysis

Linear Algebra Section 2.6 : LU Decomposition Section 2.7 : Permutations and transposes Wednesday, February 13th Math 301 Week #4

Convex Optimization methods for Computing Channel Capacity

Chapter 2. Solving Systems of Equations. 2.1 Gaussian elimination

Graduate Mathematical Economics Lecture 1

TRACES OF SCHUR AND KRONECKER PRODUCTS FOR BLOCK MATRICES

lecture 3 and 4: algorithms for linear algebra

Lecture 2: Linear Algebra Review

ELE/MCE 503 Linear Algebra Facts Fall 2018

Transcription:

Numerical Linear Algebra Numerous alications in statistics, articularly in the fitting of linear models. Notation and conventions: Elements of a matrix A are denoted by a ij, where i indexes the rows and j the columns. A denotes the transose of A. If B = A, then b ij = a ji. A vector x is a column vector, and x is a row vector.

Square Matrices The main diagonal of a square matrix A consists of elements a ii. Sub-diagonal elements are those below the main diagonal (or a ij such that i > j). Suer-diagonal elements are all a ij such that i < j. A is symmetric if a ij = a ji for all i and j. An uer triangular matrix has all sub-diagonal elements = 0. A lower triangular matrix has all suer-diagonal elements = 0. A diagonal matrix has all elements equal to 0 excet for the elements on the main diagonal. An identity matrix is a diagonal matrix I with 1 s on the main diagonal. A symmetric matrix A is ositive definite if x Ax > 0 for all x 0.

Matrix Oerations in R In R, a matrix is an array with two subscrits. However, there are utilities in R that aly only to matrices be careful about using the aroriate object tye when you want to work with matrices. > # create a matrix: > A=cbind(c(1,,3),c(4,5,6),c(7,8,9)) (1 (4 (7 > A [,1] [,] [,3] [1,] 1 4 7 [,] 5 8 [3,] 3 6 9 > is.matrix(a) [1] TRUE > # alternative ti way: > A=matrix(c(1:9),nrow=3,ncol=3) > A [,1] [,] [,3] [1,] 1 4 7 [,] 5 8 [3,] 3 6 9

More Matrix Oerations in R > # get the transose: > t(a) [,1] [,] [,3] [1,] 1 3 [,] 4 5 6 [3,] 7 8 9 > # multily l matrices: > A%*%c(1,0,0) [,1] [1,] 1 [,] [3,] 3 > c(1,0,0)%*%a [,1] [,] [,3] [1,] 1 4 7 > cbind(c(1,0,0))%*%a Error in cbind(c(1, 0, 0)) %*% A : non-conformable arguments > t(cbind(c(1,0,0)))%*%a [,1] [,] [,3] [1,] 1 4 7

More Matrix Oerations in R > # addition/subtraction > A-matrix(,nrow=3,ncol=3) [,1] [,] [,3] [1,] -1 5 [,] 0 3 6 [3,] 1 4 7 > # element-wise -- as oosed to matrix -- multilication li > A*matrix(,nrow=3,ncol=3) [,1] [,] [,3] [1,] 8 14 [,] 4 10 16 [3,] 6 1 18 > A%*%matrix(,nrow=3,ncol=3) [,1] [,] [,3] [1,] 4 4 4 [,] 30 30 30 [3,] 36 36 36

More Matrix Oerations in R > # symmetric matrices and eigenvalues: > > S=cbind(c(1,,3),c(,1,),c(3,,1)) (1 ( 1 ) (3 1)) > S [,1] [,] [,3] [1,] 1 3 [,] 1 [3,] 3 1 > eigen(s) $values [1] 5.701561-0.701561 -.0000000 $vectors [,1] [,] [,3] [1,] 0.605918 0.364519 7.071068e-01 01 [,] 0.5154991-0.8568901.75740e-16 [3,] 0.605918 0.364519-7.071068e-01 >

Available Libraries in R See R_ext/Linack.h for details about the BLAS, LINPACK, and EINPACK libraries of FORTRAN subroutines. For some descrition of these three libraries, see htt://www.netlib.org/laack. F Wii RE i Th d ll From Writing R Extensions: These are exressed as calls to FORTRAN subroutines, and they will also be usable from users' FORTRAN code.

Solving Systems of Equations Consider solving the system Ax b for x, given A and b. In scalar terms, this involves solving j1 a ij x j b, i for x 1,,x. It s generally better to calculate A -1 b as the solution to Ax = b than to calculate A -1 directly and multily.

solve() in R > A=cbind(c(,1,),c(8,3,7),c(3,,4)) > A [,1] [,] [,3] [1,] 8 3 [,] 1 3 [3,] 7 4 > b=cbind(c(,5,8)) bi ( > # the solve() function actually uses the QR factorization: > x=solve(a,b) > x [,1] [1,] 3 [,] - [3,] 4 > A%*%x [,1] [1,] [,] 5 [3,] 8

solve() in R, continued > # to obtain the inverse of A: > Ai=solve(A) > Ai [,1] [,] [,3] [1,] 11-7 [,] 0-1 [3,] -1 - > # getting x directly, with the inverse of A (not a good idea): > x=ai%*%b > x [,1] [1,] 3 [,] - [3,] 4 >

Mathematics and Statistics Uer- and Lower-Triangular Matrices Uer and Lower Triangular Matrices For an uer triangular A, the system Ax = b can be written as: b x a x a x a 1 1 1 1 11 b x a x a 1 1 1 1 11 b x a This can be readily solved with backward substitution, starting with the last equation: 1 1, 1, 1 1 / ) ( / a x a b x a b x 11 1 1 1 1 / ) ( a x a x a b x There is an analogous forward substitution algorithm for lower triangular systems.

Triangular Matrices in R > A [,1] [,] [,3] [1,] 1 3 [,] 0 1 1 [3,] 0 0 > b=c(8,4,) > x=backsolve(a,b) > x [1] -1 3 1

Gaussian Elimination Recall that Gaussian elimination (GE) involves augmenting the matrix A with an additional column containing b, followed by these stes: 1) We first reduce the A ortion of this matrix to uer triangular form using elementary row oerations. ) We next work in reverse, starting from the last row and working our way u, reducing the A ortion to an identity matrix. What remains in the last column is the solution x. Try this for the system in the R solve() examle.

The LU Decomosition Note that Gaussian elimination can be viewed simly as the factoring of A into the roduct of a lower triangular matrix L and an uer triangular matrix U. The matrix U is the matrix left in the A ortion of the augmented matrix used for GE when A is reduced to uer triangular form. The sub-diagonal elements of the matrix L reresent the multiliers used at each stage of GE. The diagonal elements of L are all 1 s.

Comuting the LU Decomosition Exlicit formulas for the elements of U and L are given by u ij a ij i1 k 1 l ik u kj, i 1,..., j; l ij 1 u jj j1 a l u, i j 1,...,. ij k 1 ik kj Once L and U are comuted, we solve Ax = LUx = b first by using a forward substitution to solve for y in Ly = b, and then using a backward substitution to solve for y in Ux = y.

Advantages of the LU Aroach No additional comutations are needed (beyond what s required for GE). Solutions for any right-hand vector b can be comuted without redoing the GE; b is not needed when A is factored. LU yields other useful quantities; e.g., det(a) is the roduct of the diagonal elements of U, and each of the columns of A -1 can be comuted by taking b to be the corresonding column of the x identity matrix.

Vector Norms Vector and matrix norms lay an imortant role in error analysis. A norm tyically y measures in some sense the magnitude of an argument. For a real number, this is ordinarily the absolute value. For a vector x = (x 1,x,,x ), three common choices are the 1-norm, or L 1, defined by x 1 i11 the -norm, or L, defined by x i, x 1/ x, i 1 i and the -norm, or L, defined by x max xi i.

Matrix Norms To generalize these norms to matrices, a useful (but not unique) method is to define corresonding matrix norms from the vector norms through This yields A su Ax / x, for j 1,,,. j x0 j A 1 max aij, j j i A max aij A' 1, i and a value of A that is equal to the largest singular value of A. j

Condition Numbers The condition number of a square matrix A is defined to be ( A) A which is comuted as if A is singular. j j A 1 j, Some remarks: The lower bound of the condition number is 1. This yields a useful measure of how close a matrix is to singularity When solving a system Ax = b, it turns out that the relative error of the comuted solution is roortional to κ j (A).

Matrices and Linear Regression In statistical alications, we often run into roblems of the form y i x ij 1 j j i, where the y i are the resonses, the x ij are the covariates, the β j are the regression coefficients, and the ε i reresent the error terms. If the ε i can be assumed to be indeendent variables with 0 mean and a variance of σ, then we often use the least squares estimators for the β j.

The Least Squares Aroach The least square solution is the vector β = (β 1,, β ) that minimizes y X ( y X )'( y X ), where y = (y 1,, y n ), and X = (x ij ) with x i1 = 1 for all i (if an intercet term is included). Note that the solution ˆ ( ˆ ˆ 1,..., )' gives the vector of fitted values yˆ Xˆ X that is closest (in the Euclidean norm) to the actual resonses.

The Least Squares Solution An obvious way of obtaining the solution is to set the gradient y X 0, obtaining i the normal equations X ' X X ' y. This system can be solved using the methods described reviously. In articular, since X X is ositive definite it for full rank X, then the Choleski decomosition (a secial case of LU factorization for ositive definite matrices) can be very efficient.

Comutational Considerations We often want a variety of different models fit (e.g., stewise regression), so it d be good to have a fast method for udating the fitted model when covariates are added or droed. Along with the solution, we may also want other quantities such as Residuals Fitted values Regression and error sums of squares Diagnostic measures (e.g., diagonal elements of the rojection oerator X(X X) -1 X the so-called hat matrix )

Other Otions With the ractical considerations outlined on the slide revious, two very efficient techniques the QR decomosition and the singular value decomosition (SVD) involve decomosition of X directly. Advantages: It turns out that factoring X directly is a better conditioned roblem than factoring X X. QR or SVD allows us more easily to add and subtract covariates directly without a lot more additional work.

Rotations and Orthogonal Matrices A rotation in R is a linear transformation Q: R R such that Qx x, for all x in R. A rotation does not affect the length of vectors, but changes their orientation it can be thought of as a change in the coordinate axes, without a change in vector length.

Proerties of a Rotation Q From Qx = x for all x, it follows that x Q Qx = x x, so that x Q Q I x = 0 for all x. This is only true if Q Q = I, since Q Q I is symmetric. For square matrices only, Q Q = I imlies QQ = I. So Q = Q -1, and Q must be of full rank. Therefore, any x in R can be reresented by Qy for some y in R. When Q Q = I, the columns of Q are mutually orthogonal and each has unit length. For square matrices, either of these imlies the other a square matrix satisfying these roerties is said to be orthogonal. If Q is a rotation, then Q = 1. Since Q -1 = Q is also a rotation, then Q -1 is also 1, so that κ (Q) = 1. If Q 1 and Q are orthogonal matrices, then (Q 1 Q ) (Q 1 Q ) = Q Q 1 Q 1 Q = Q Q = I, so Q 1 Q is also orthogonal. Because of these characteristics any rotation is given by an orthogonal matrix Because of these characteristics, any rotation is given by an orthogonal matrix, and vice-versa.

Householder Transformations There are various ways of obtaining a rotation, such as a lane rotation ti (e.g., Jacobi or Givens rotations). ti Another family of rotations is referred to as the Householder transformations. It s of the form H I uu ', u' u where I is the identity matrix and u is any vector (of the roer length). By convention, H = I when u = 0. An imortant alication of Householder transformations is to transform matrices to uer triangular form.

Householder for a Single Vector Let x be an n-dimensional vector, and define u by u i 0, 0 i t, xt s, i t, x i, t i n, with s ) n 1/. j t j sign( x t x Then it can be shown that Hx = x u(u x)/(u u)=x u) x u, so that (Hx) i = x i for i < t, (Hx) i = 0 for i > t, and (Hx) t = s. (The sign of s is chosen so that x t and s will have the same sign.) Thus, the last n t comonents have been set to zero in the transformation Hx.

Householder for a Matrix We can erform a series of such transformations on the columns of a matrix in such a way as to leave the transformed matrix in uer triangular form. The transformation ti described d on the revious slide for x alied to another vector y yields Hy y u ( u ' y ) /( u ' u ), So that the first t 1 comonents of Hy are the same as y, and the other comonents are of the form y i fu i, where f y u / u. jt j j jt j

QR and Least Squares Recall the roblem of obtaining the least squares solution to y = Xβ. The motivation for the QR decomosition is that for any n x n orthogonal matrix Q Q' y Q' X y X, so that a β minimizing the former will also minimize the latter. Suose that we can find a Q such that Q' X 0 R ( n ), where R is uer triangular and 0 is a matrix of zeroes.

QR and Least Squares, continued Partition the Q described on the revious slide into Q = (Q 1, Q ), with Q 1 containing the first columns of Q and Q containing the other columns. Then Q' y Q' X Q 1' y R Q1 ' y R Q ' y Q ' y, so that Q 1 Rβ is minimized by ˆ R which is the least squares solution. 1 Q 1 ' y, 1

Obtaining Q for Least Squares We can obtain the transformation Q for X using the roduct of Householder transformations. For examle, if X j reresents the jth column of X, then one way of finding Q requires these stes: Let H 1 be the Householder transformation described reviously with x = X 1 and t = 1. Let X (1) X.ThenX(1) j be the jth column of H 1 X 1 has all elements excet for the first equal to 0. Next, let H be the Householder transformation with x = X (1) and t =, and let X () X.ThenX() j be the jth column of H H 1 X has all elements excet ossibly the first two equal to 0. Also, X () 1 = X (1) 1 ; that is, H did not change the first column, so now the first two columns of H H 1 X are in uer triangular form. Continuing, at the kth stage (k = 3,,)letH H k be the Householder transformation with x = X (k-1) k and t = k, and let X (k) j be the jth column of H k H 1 X. Then X (k) j = X (k-1) j for j < k, and the first k columns of the resulting matrix are in uer triangular form. After the th ste, the matrix H H 1 X has the form of Q X defined two slides revious.

Least Squares Quantities and QR To obtain the least squares estimates, we need Q 1 y, which can be comuted by alying the Householder transformations to y either during or after they are comuted for X. Then solve the uer triangular system Rβ = Q 1 y. (Note that once we ve comuted Q for X, we can aly it using different y s.) The error variance is given by ˆ y X /( n ) Q' y Q' X /( n ) /( n Recall that the diagonal elements of the hat matrix H = X (X X) X) -1 X are called the leverage values they rovide a diagnostic for identifying influential observations, or observations that have a relatively large effect on the estimates of the regression coefficients. Note that the ith diagonal element of H is given by h ii = x i (X X) 1 x i, where x i is the covariate vector of the ith observation. Since X = Q 1 R, then X X = R Q 1 Q 1 R = R R (note that Q 1 Q 1 = I x, but Q 1 Q 1 is not an identity matrix). Hence x i (X X)( ) -1 x i = x i (R R)( ) -1 x i = (R ) 1 x i. Q ' y ).

Singular Value Decomosition (SVD) This is regarded as the most stable means of solving linear systems. The SVD has the form X UDV ', Where X is an n x matrix with n >, U nx has orthonormal columns, D x is diagonal with d ii > 0, and V x is orthogonal. The d ii are called the singular values of X. Assume that d 11 d. (Note: this isn t the only form of the SVD. Another involves an orthogonal U nxn, and D nx where the n rows of zeroes are aended to the D defined above.)

Some Proerties of the SVD Since the columns of U are orthonormal, then U U = I x (although UU I nxn, unless n = ). Since X X = VDU UDV = VD V, then it follows that the columns of V are eigenvectors of X X and that the d ii are the corresonding eigenvalues. If X is a square x nonsingular matrix, then both U and V are orthogonal matrices, and X -1 = (V ) -1 D -1 U -1 = VD -1 U. So once the SVD is comuted, inverting the matrix X really only requires inverting a diagonal matrix. For a general n x matrix X,, with SVD UDV,, rank(x) = the number of nonzero d ii. A generalized inverse of X is any matrix G satisfying XGX = X. Let D + be the diagonal matrix with elements d ii+ = 1/d ii, if d ii > 0, and d ii+ = 0 if d ii = 0. Then a articular generalized inverse for X is given by X VD U '. This articular inverse is called the Moore-Penrose generalized inverse.

Comuting the SVD is somewhat comlicated. It involves finding orthogonal matrices U e and V such that the uer x block of U e XV is a diagonal matrix, with the rest of the matrix consisting of zeroes. We must roceed with alternating rows and columns, building Householder transformations U h XV h = B, where B is in bidiagonal form with nonzero elements b ii and b i,i+1, i = 1,,. We then use an iterative algorithm to find the singular values and transformations U b and V b such that U b B V b = D. Details are in Numerical Recies (either for C or Fortran) by Press et al.

SVD and Least Squares For our same least squares roblem, if rank(x) = and UDV is the SVD of X, then X X X= = VD V. The least squares solution then is ˆ 1 1 ( X ' X ) X ' y VD V ' VDU ' y VD U ' y. Once we have the SVD, finding the least squares solution involves alying the orthogonal transformations used for U to y, and inverting the diagonal matrix D, along with some additional matrix multilication.