Lecture # 5 The Linear Least Squares Problem Let X R m n,m n be such that rank(x = n That is, The problem is to find y LS such that We also want Xy =, iff y = b Xy LS 2 = min y R n b Xy 2 2 (1 r LS = b Xy LS Our approach, compute the Q R decomposition, that is, ( n R X = Q, m n where Q R m m is orthogonal and R R n n is upper triangular and nonsingular That yields a procedure of the form 1 Use [Q,R] = qr(x to get Q and R 2 Compute ( n c c = 1 = Q T b m n 3 Solve and compute Ry LS = c 1, ( r LS = Q These come from the following MATLAB commands 1
[m,n]=size(x; % Find dimensions of X [Q,R]=qr(X; % Compute orthogonal factorization c = Q b; % c = Q T b yls = R(1:n, 1:n\c(1:n; % Solve Ry LS = c 1 if m > n rls = Q [zeros(n, 1;c(n + 1:m]; % More efficient ( is rls = Q(:,n + 1:mc(n + 1:m if we actually have Q % r LS = Q else rls = zeros(m, 1; end; Note that X T r LS = ( R T ( Q T Q = ( R T ( = Thus r LS is orthogonal to the columns of X There are three well known ways to construct a Q R decomposition: Householder Golub factorization Modified Gram Schmidt orthogonalization Givens Q R factorization We will say a lot about the first two and just touch on the third MATLAB uses the first and we will start with that The matrix H = I 2ww T, w 2 = 1 is called a Householder transformation Often it is defined in terms of any nonzero vector v as H = I 2vv T /(v T v As you will show, H = H T, H T H = I, H 2 = I 2
It is common to choose H such that for a given vector x, Since H is orthogonal, Hx = αe 1 = α Hx 2 = x 2 = α e 1 2 = α We use this transformation to insert zeros into a matrix The choice of w is We note that w = (x αe 1 / x αe 1 2 x αe 1 2 2 = x T x + α 2 e T 1 e 1 2αx T e 1 = x T x + α 2 e T 1 e 1 2αx 1 To prevent cancellation in the first entry of the Householder vector, it is recommended to choose so that α = sign(x 1 x 2, x αe 1 2 2 = 2 x 2 2 + 2 x 2 x 1 If you go to my notes for CSE/Math 55, it shows a way to allow you to choose α to have the opposite sign http://wwwcsepsuedu/~barlow/cse55/chap4pdf Show for yourself that Hx = (I 2ww T x = αe 1 To apply a Householder transformation, that is to compute C = HB we use C = (I 2ww T B = B 2ww T B = B wf T 3
where f = 2B T w Thus a Householder transformation is the result of a matrix vector product and an outer product The latter computes the components of C from c ij = b ij w i f j Although this is 4mn arithmetic operations, thus much less work than multiplying two matrices Never apply a Householder transformation any other way To compute the Q R factorization of X, let X = (x 1,,x n Choose H 1, a Householder transformation such that H 1 x 1 = r 11 e 1, thus where X 1 = H 1 X = ( 1 n 1 1 r 11 R 12 m 1 X1 X 1 = (x (1 2,,x (1 n Choose H 2 such that Then let H 2 = H 2 x (1 2 = r 22 e 1 ( 1 m 1 1 1 m 1 H2 which leaves the first row unaffected Then ( 2 n 2 2 R (2 11 R (2 12 X 2 = H 2 X 1 = H 2 H 1 X = m 2 X2 4
Suppose X k 1 = H k 1 H 1 X = ( k 1 n k + 1 k 1 R (k 1 11 R (k 1 12 m k + 1 Xk 1 where Choose and let Then X k 1 = (x (k 1 k,,x (k 1 n H k = H k x (k 1 k = r kk e 1 ( Ik 1 Hk X k = H k X k 1 = H k H 1 X = ( k n k k R (k 1 11 R (k 1 12 m k Xk 1 At k = n we have X n = H n H 1 X = ( n R m n where R is upper triangular and nonsingular Thus and Q T = H n H 1 Q = H 1 H n (2 MATLAB explicitly computes Q, but that is not necessary It is done to conform with a software philosophy If we were to write a routine in C + + or some compiled language for this, we might prefer to store the vectors w 1,,w n that define H k = I 2w k w T k 5
Since the first k 1 components of w k are zero, in many codes, X is overwritten by (in a 4 3 case w 11 r 12 r 13 w 21 w 22 r 23 w 31 w 32 w 33 w 41 w 42 w 43 The diagaonal of R can be stored in an extra vector (r 11,r 22,r 33 T, say Now, we can fill in some blanks If Q is as in (2, we compute c from c = H n H 1 b = Q T b and r LS from r LS = H 1 H n ( Then we recover the solution from Ry LS = c 1 It turns out the the Householder Q-R decomposition gives us left orthogonal basis matrices for two important subspaces Range and Null Spaces The range of a matrix X R m n is the set Range(X = {Xy : y R n } The range is a subspace of R m The rank of X is the dimension of its range space and is written rank(x Clearly, rank(x min{m, n} The span of a set of vectors x 1,,x n is given by span{x 1,,x n } = Range(X where X = (x 1,,x n If rank(x = n, then x 1,,x n is a basis for Range(X and X is a basis matrix A matrix X is rank deficient if rank(x < min{m,n}, is said to have full column rank if rank(x = n, and is said to have full row rank if rank(x = m The null space of X is the linear subspace of R n given by Null(X = {y R n : Xy = } 6
The columns of Q contain bases for two important subspaces Let Q = ( n m n Q 1 Q 2 The matrices Q 1 R m n and Q 2 R m m n are left orthogonal matrices satisfying Q T 1 Q 2 = Since X = Q 1 R it is easily verified that Range(Q 1 = Range(X That is, the columns of Q 1 are an orthonormal basis for Range(X Since then one can show that X T Q 2 = R T Q T 1 Q 2 = Range(Q 2 = Range(X = Null(X T From these two matrices we get orthogonal projections P 1 = Q 1 Q T 1, P 2 = Q 2 Q T 2 are projections on the spaces Range(X and Null(X T However, since if we have one, we have the other! P 2 = I P 1, P 1 = I P 2 7