Linear Algebra with Computer Science Application April 8, 018 1 Least Squares Problems 11 Least Squares Problems What do you do when Ax = b has no solution? Inconsistent systems arise often in applications When a solution is demanded and none exists, the best one can do is to find an x that makes Ax as close as possible to b Think of Ax as an approximation to b The smaller the distance between b and Ax,$ the better the approximation The general least-squares problem is to find an x that makes b Ax as small as possible The adjective least-squares arises from the fact that b Ax is the square root of a sum of squares Definition 1 If A is m n and b is in R m, a least-squares solution of Ax = b is a vector ˆx in R n such that b Aˆx b Ax for all x R n The most important aspect of the least-squares problem is that no matter what x we select, the vector Ax will necessarily be in the column space, ColA So we seek an x that makes Ax the closest point in ColA to b Of course, if b happens to be in ColA, then b is Ax for some x, and such an x is a least-squares solution ) 1 Solution of the general least-squares problem Given A and b as above, apply the Best Approximation Theorem to the subspace ColA Let ˆb = proj ColA b, Because ˆb is in the column space of A, the equation Ax = ˆb is consistent, and there is an ˆx in R n such that Aˆx = ˆb 1/6
By construction we have b ˆb ColA, b Aˆx ColA, implying that ˆx is the best approximation solution We can write this orthogonality a j (b Aˆx) = 0, a T j (b Aˆx) = 0, Since a T j are rows of A T we have A T (b Aˆx) = 0, Therefore, to find ˆx we solve A T Aˆx = A T b This matrix equation represents a system of equations called the normal equations for Ax = b Theorem 1 The set of least-squares solutions of Ax = b coincides with the nonempty set of solutions of the normal equations A T Ax = A T b 13 Example Find a least-squares solution of the inconsistent system Ax = b for A = 0 1, b = 0 0 1 11 Solution To use normal equations we compute: [ ] 0 A T 0 1 A = 0 = 0 1 1 1 [ ] A T 0 1 b = 0 = 0 1 11 [ ] 17 1 1 5 [ ] 19 11 Then the equation A T Aˆx = A T b becomes [ ] 17 1 x = 1 5 [ ] 19 1 /6
Row operations can be used to solve this system, but since A T A is invertible and, it is probably faster to compute and then to solve A T Ax = A T b as 1 Example ˆx = (A T A) 1 A T b = 1 8 Find a least-squares solution of Ax = b for (A T A) 1 = 1 8 [ ] 5 1 1 17 [ ] [ ] 5 1 19 1 17 11 1 1 0 0 3 1 1 0 0 1 A = 1 0 1 0 1 0 1 0, b = 0 1 0 0 1 5 1 0 0 1 1 = 1 [ ] 8 = 8 168 [ ] 1 Solution Compute 6 A T A = 0 0 0 0, AT b = 0 0 6 The augmented matrix for A T Ax = A T b is 6 1 0 0 1 3 0 0 0 0 0 1 0 1 5 0 0 1 1 0 0 0 0 0 0 0 The general solution is x 1 = 3 x, x = 5 + x, x 3 = + x, and x is free So the general least-squares solution of Ax = b has the form 3 1 ˆx = 5 + x 1 1 0 1 The next theorem gives useful criteria for determining when there is only one least-squares solution of Ax = b Of course, the orthogonal projection ˆb is always unique 3/6
15 Least square solution Theorem Let A be an m n matrix The following statements are logically equivalent: i The equation Ax = b has a unique least-squares solution for each b in R m ii The columns of A are linearly independent iii The matrix A T A is invertible When these statements are true, the least-squares solution ˆx is given by ˆx = (A T A) 1 A T b When a least-squares solution ˆx is used to produce Aˆx as an approximation to b, the distance from b to Aˆx is called the least-squares error of this approximation 16 Example Given A and b as in the first example, determine the least-squares error in the least-squares solution of Ax = b We had b = 0 and Aˆx = 11 3 Solution and The error is given by b Aˆx = 0 =, 11 3 8 b Aˆx = + 16 + 6 = 8 The least-squares error is 8 For any x R, the distance between b and the vector Ax is at least 8 See the figure below Note that the least-squares solution ˆx itself does not appear in the figure 17 Alternative calculations of least-squares solutions The next example shows how to find a least-squares solution of Ax = b when the columns of A are orthogonal Such matrices often appear in linear regression problems, discussed in the next section 18 Example Find a least-squares solution of Ax = b for /6
1 6 1 A = 1 1 1, b = 1 1 7 6 Solution Because the columns a 1 and a of A are orthogonal, the orthogonal projection of b onto ColA is given by ˆb = b a 1 a 1 a 1 a 1 + b a a a a = a 1 + 1 a, ˆb = 1 1 5/ 11/ Now that ˆb is known, we can solve Aˆx = ˆb But this is trivial, since we already know what weights to place on the columns of A to produce ˆb It is clear from the equation above that 19 Numerical notes ˆx = [ ] 1/ In some cases, the normal equations for a least-squares problem can be ill-conditioned; that is, small errors in the calculations of the entries of A T A can sometimes cause relatively large errors in the solution ˆx If the columns of A are linearly independent, the least-squares solution can often be computed more reliably through a QR factorization of A 110 Least-squares and QR factorization Theorem 3 Given an m n matrix A with linearly independent columns, let A = QR be a QR factorization of A Then, for each b in R m, the equation Ax = b has a unique least-squares solution, given by ˆx = R 1 Q T b 111 Example Find the least-squares solution of Ax = b for 1 3 5 A = 1 1 0 1 1, b = 1 3 3 3 5 7 3 Solution The QR factorization of A can be obtained as follows 1 1 1 Q = 1 1 1 1 5 1 1 1, R = 0 3 0 0 1 1 1 5/6
Then Using back substitution in R we have 6 Q T b = 6 10 ˆx = 6 6/6