Math 57 Spring 18 Projections and Least Square Solutions Recall that given an inner product space V with subspace W and orthogonal basis for W, B {v 1, v,..., v k }, the orthogonal projection of V onto W is defined by Proj W (u) u, v 1 v 1, v 1 v 1 + u, v v, v v + + u, v k v k, v k v k. This function has the following properties: (a) It is a linear operator on V (b) It is a projection operator: P P (c) Given a vector v, the vector v Proj W (v) is orthogonal to every vector in W. In this set of notes, I address the following question: Suppose T : U V is linear but not onto. If b is not in the range of T what can we say about T (u) b other than it is inconsistent? That is, what is the best we can do in a situation like this? Definition 1 A vector u is called a least squares solution to T (x) b if T (u) b is a minimum over all x U. That is, for every x U, T (u) b T (x) b for all x V. One way to find a least squares solution is to find an expression for T (x) b, or more likely, T (x) b and use calculus or some other approach ( to ) find ( a minimum for x. For example, suppose that T : R R is defined by T ) a b a + b a + b + c c d ( a ) + b + c a + b + c + d 1 and suppose we want the least squares solution to T (A). An inner product on ( ) 4 a b R is A, B tr(b t A). We want c d ( ) ( ) ( ) ( ) a + b a + b + c 1 a + b a + b + c 1, a + b + c a + b + c + d 4 a + b + c a + b + c + d 4 as small as possible. This calls for minimizing the expression (a + b 1) + (a + b + c ) + (a + b + c ) + (a + b + c + d 4) over all a, b, c, d R. As we might have learned in Calc II, one way to do this: get a system of equations in a, b, c, d by taking partials with respect to a, b, c, d and setting them equal to. For us, the system is 4a + 4b + c + d 1 4a + 4b + c + d 1 a + b + c + d 9 a + b + c + d 4 The solution to this system is a 1 b, c, d (. That is, there are infinitely many ) ( ) 1 1 5/ least squares solutions, with one of them being A, with T (A). / / 5/ 4 Now let s apply the theory of projections to this problem.
Theorem 1 A vector u is a least squares solution to T (x) b if and only if u is a solution to T (x) Proj W (b) where W is the range of T. Proof: Let w Proj W (b) and let x be any vector in V. We have T (x) b T (x) b, T (x) b (T (x) w) (b w), (T (x) w) (b w) T (x) w, T (x) w T (x) w, b w b w, T (x) w + b w, b w. However, T (x) w, b w for all x V. This is because T (x) w is in W, the range of T but b w b Proj W (b) is orthogonal to every vector in W. Consequently, T (x) b T (x) w + b w. Since b w is a constant, independent of x, to minimize this quantity, we must minimize T (x) w, which happens if and only if x is a solution to T (x) w. Back ( to) our example. An alternative to calculus to solving our problem is to project 1 u onto the range of T and solve the resulting equation. The range of T has basis 4 {( ) ( ) ( )} 1 1 {v 1, v, v },,. By a happy coincidence, this is an orthogonal 1 1 basis. We have Proj W (u) u, v 1 v 1, v 1 v 1 + u, v v, v v + u, v v, v v 1 ( ) 1 + 5 ( ) 1 + 4 ( ) 1 1 1 1 ( ) 1 5/ 5/ 4 ( ) ( ) a + b a + b + c 1 5/ so we solve. This leads again to a + b 1, c a + b + c a + b + c + d 5/ 4, d. Least Squares Solutions to Systems of Equations We can say much more in the case of solving (inconsistent) systems of equations Ax b, where we use the usual dot product for our inner product. By Theorem 1, a least squares solution will be a solution to Ax w where w is the projection of b onto the column space of A, since the column space of A is the range of the transformation T (x) Ax. Here is an example: Suppose we want the least squares solution to 1 4 5 6 ( x y ) 6. Note Page
that the system is inconsistent. What we seek are x and y so that Ax is as close as possible x + y x + y to b. One way is with calculus: Minimize x + 4y 6, x + 4y 6 5x + 6y 5x + 6y (x + y ) + (x + 4y 6) + (5x + 6y ). Using partial derivatives, we get the system 5x + 44y 6 of equations, which has solution x 4, y 4. 44x + 56y 48 Alternatively, we project onto the column space of A. For that, we find an orthogonal 1 1 basis, say, 4, for the column space of A. The projection of 6 onto the 5 5 1 1 4 1 ( ) 4 column space is 6 + 48 4 4. Now solve the new system 4 x 4, 5 1 y 5 5 ( ) 4 5 6 4 4 a consistent system with solution. 4 But in this setting, things are nicer than in general. Theorem If A is an m n matrix, then the set of least squares solutions to Ax b is the same as the set of solutions to A t Ax A t b. In particular, this last equation is always consistent. Proof: Let W be the column space of A. Then W null(a t ). This is because A t v is a column vector whose entries are dot products of v with the rows of A t, which are the columns of A. So if A t v then v must be orthogonal to each of the columns of A, meaning v W. Similarly, if v W then v is orthogonal to every row of A, and so A t v and v null(a t ). Suppose that u is a least squares solution to Ax b. Then Au Proj W (b) so b Au b Proj W (b), a vector orthogonal to every vector in W. That is, b Au W. By the comment above, this means b Au is in the null space of A t so A t (b Au). This rearranges to A t Au A t b. Similarly, if A t Au A t b then A t (b Au) so b Au null(a t ) W. If we write Au b v W then Au v + b. Since the Proj is a projection, Proj W (w) w for every w W. This means Proj W (Au) Au. We have Au Proj W (Au) Proj W (v + b) Proj W (v) + Proj W (b). But v is orthogonal to all vectors in W so Proj W (v),giving Au Proj W (b). 1 ( ) Back to our previous example, we can now find the least squares solution to 4 x y 5 6 Page
6 by multiplying both sides by A t. Following this approach, ( ) ( ) ( ) 1 5 1 4 x 1 5 6, 4 6 y 4 6 5 6 ( ) ( ) 5 44 x or 44 56 y ( ) 6, a system that we solved earlier. 48 In general, if A is an m n matrix of rank n, then A t A will be an invertible matrix. More generally, the rank of A t A is the same as the rank of A. When A t A is invertible, we can find the least squares solution to Ax b via the formula x (A t A) 1 A t b. Moreover, since we are really solving Ax Proj W (b) we end up with a formula for the projection, More generally, Proj W (b) A(A t A) 1 A t b. Theorem Let W be a subspace of R n with basis {w 1, w,..., w k }. Then the standard matrix for the projection of R n onto W is A(A t A) 1 A t where A has the vectors w i as columns. As an example, on Homework 9, I asked for the standard matrix for the orthogonal projection of R onto the plane x + y z. Here are two solutions, one not making use of Theorem, the other using Theorem. For the homework, you should have used this first approach. The first method is to let W be the space defined by the plane, to find an orthogonal basis for the plane, and to use the formula for a projection in terms of an orthogonal basis to get the answer. In the solutions, I found an orthogonal basis by inspection, but let s say we did not do that. Writing x y + z, and writing things horizontally to save vertical space, a basis is {( 1, 1, ), (, 1)}. Orthogonalizing this, b (,, 1) (, 1), ( 1, 1, ) ( 1, 1, ) (,, 1) + ( 1, 1, ) (1, 1, 1). ( 1, 1), ( 1, 1, ) This is nice, the natural basis for the plane transformed into the basis I used in the homework solution. The rest of the solution is now the same: (x, y, z), ( 1, 1, ) (x, y, z), (1, 1, 1) T (x, y, z) ( 1, 1, ) + (1, 1, 1) ( 1, 1, ), ( 1, 1, ) (1, 1, 1), (1, 1, 1) x + y ( 1, 1, ) + x + y + z (1, 1, 1), 1/ 1/ 1/ 1/ 1/ 5/6 1/6 1/ with matrix 1/ 1/ + 1/ 1/ 1/ 1/6 5/6 1/. 1/ 1/ 1/ 1/ 1/ 1/ Page 4
The calculation making use of Theorem would go like this: Start with the same basis as before, {( 1, 1, ), (, 1)}, and form a matrix A with these basis vectors as its columns. Then ( ) 1 1 the matrix of the projection is A(A t A) 1 A t. In this case, A t A 1 1 ( ). The matrix of the transformation is 5 as before. 1 A(A t A) 1 A t 1 1 ( ) ( ) 5 1 1 6 1 1 1 1 ( ) 1 1 5 6 1 1 5 1 5/6 1/6 1/ 1 5 1/6 5/6 1/, 6 1/ 1/ 1/ 1 1 Some special cases that come up frequently: Linear and quadratic fits to data. Suppose we have a bunch of points (x 1, y 1 ), (x, y ),..., (x k, y k ), and we want the best linear approximation, y mx + b, to these points. One can quibble about what best means here, but the most common thing to do is to find a least squares solution: for any given point, we have (x i 1) ( ) m y b i so the problem can be case as solving x 1 1 y x 1 ( ) 1 m y.. b.. x k 1 If we wish for a quadratic fit instead, y ax + bx + c, then the least squares problem is x 1 x 1 1 y 1 x x 1 a b y... c.. x k x k 1 Note the Vandermonde-like nature of the matrix involved. This extends to higher degree polynomials. Let s do one final example, the ill-conceived example from class, T : P P defined by T (p(x)) p(1) + (x 1) p (1), with the problem being to find the least squares solution to T (p(x)) x + x + 1 with inner product p(x), q(x) y k y k p(x)q(x) dx. I quickly gave up on both approaches to the problem, a calculus based approach and the approach using Page 5
projections. Here are both worked out. For the calculus based approach, we seek to minimize T (p(x)) (x + x + 1), T (p(x)) (x + x + 1). Letting p(x) ax + bx + c, T (p(x)) a + b + c + a(x 1) ax ax + a + b + c. Working with the square of the norm we want the minimum of ((a 1)x (a + 1)x + a + b + c 1) dx In class I mentioned that it is easiest to differentiate under the integral sign giving a partial derivative with respect to b of ( 1 ((a 1)x (a + 1)x + a + b + c 1) dx (a 1) 1 ) (a + 1) + a + b + c 1. Setting this equal to gives 4a + b + c 11. We get the same equation from the partial with 6 respect to c. I balked at the partial with respect to a which requires the integral ((a 1)x (a + 1)x + a + b + c 1)(x x + ) dx. I used Maple to get 8a + 4b + 4c 17. Multiplying the first equation by -4/ and adding 15 6 to the second gives a 9, b + c 17. That is, there are infinitely many least squares 16 4 solutions with one of them (taking b ) being p(x) 9 16 x + 17. 4 Now let s use the projection method. We project x +x+1 onto the range to T and solve a new equation. The range of T is spanned by 1 and (x 1). If we orthogonalize this, one basis is { 1, (x 1) 1 } {b1, b } and the projection of x + x + 1 onto the range will be x + x + 1, b 1 b 1, b 1 b 1 + x + x + 1, b b 11 b, b 6 + 9/18 ( (x 1) 1 ) 9 4/45 16 (x 1) + 9 16. Now we solve T (p(x)) this new vector or a(x 1) + a + b + c 9(x 16 1) + 9. We 16 must have a 9 and b + c 17 as before. 16 4 Page 6