Least-Squares A famous application of linear algebra is the case study provided by the 1974 update of the North American Datum (pp. 373-374, a geodetic survey that keeps accurate measurements of distances and elevations at 268,000 points along the topography of this continent. The data collected provided the input for the solution of a massive system of linear equations involving 900,000 variables and twice as many equations! What s more, the system determined by the measured data was inconsistent! In fact, this is not surprising: data from real-world measurements is often prone to many kinds of errors, and may lead to systems of equations that are formally unsolvable because of these errors. What is needed then is a method to deal with such obstacles. This is the main idea behind the method of least squares. When a system of equations Ax = b with m n coefficient matrix A is inconsistent, it is because the vector b does not lie in the column space of A. It is often the case in practice that the system is inconsistent because the data for the system, its coefficient matrix A and vector of constants b, is inaccurate due to measurement error; that is, the system is almost consistent.
This suggests that the two vectors Ax and b should be nearby in R n, and that any attempt to solve the unsolvable system Ax = b should instead look for a vector x so that A x is as close to b as possible. That is, we want to locate the vector x for which b A x b Ax for every x R n. We can rephrase this condition by saying that we want to find the value x of the vector x that minimizes the quantity b y, where y = Ax. But minimizing b y is identical to minimizing b y 2 = (b 1 y 1 2 +L+ (b n y n 2. Because solving this new problem involves minimizing a quantity that is a sum of squares, the solution vector x is called a least-squares solution to the problem given by the inconsistent system Ax = b. To find the least-squares solution to Ax = b, we project b onto Col A to obtain the vector b = proj ColA b, the closest vector in Col A to b. Since b lies in Col A, we can then find an x that satisfies A x = b. This x satisfies our least-squares condition.
Notice that if Ax = b is consistent, then any vector x that makes A x as close to b as possible will be a regular solution to the consistent system Ax = b, i.e., the distance b A x will be zero. When the system Ax = b is inconsistent, the value of b A x is positive; we call it the least-squares error associated with the system. The Orthogonal Decomposition Theorem says that since b lies in Col A, then b A x = b b is a vector orthogonal to Col A. Therefore, b A x is orthogonal to each of the column vectors a j ( j = 1,2,,n of the matrix A. We can express this relationship by writing a T j ( b A x = 0. But the vectors a T j are the rows of A, so taken together, these orthogonality conditions can be written in the form A T ( b A x = 0, or more simply, A T A x = A T b. Conversely, any solution to the system of equations (* (A T A x = A T b will yield a vector x = x for which b A x is orthogonal to each of the column vectors a j of the matrix A, hence will be a least-squares solution to Ax = b.
For this reason, we call the system of equations determined by (* the normal equations for Ax = b. Our discussion proves the Theorem The set of least-squares solutions to Ax = b is precisely the solution set of the system of normal equations (A T A x = A T b. // (See Example 2, p. 412. In many cases, the least-squares problem Ax = b comes with a matrix A having linearly independent columns, so that A has a QR decomposition. We can exploit this property to compute the leastsquares solution more quickly, as we now show. Theorem If A is an m n matrix with linearly independent columns, then where A = QR is any QR decomposition of A, the vector x = R 1 Q T b is the only least-squares solution to the system with matrix equation Ax = b. That is, the least-squares solution to Ax = b is the solution to the system Rx = Q T b. Proof Recall that the columns of Q form an orthonormal basis for Col A, so the projection of b
onto Col A can be computed as b = QQ T b. Then b = QQ T b = Q(RR 1 Q T b = (QR(R 1 Q T b = A(R 1 Q T b shows that the vector x = R 1 Q T b satisfies A x = b. That is, x = R 1 Q T b is a least-squares solution for Ax = b. Conversely, if x is any least-squares solution for Ax = b, then it satisfies the normal equations: (A T A x = A T b. But since the columns of Q are orthonormal, Q T Q = I, and so A T A = (QR T QR = R T Q T QR = R T R. Therefore, the normal equations take the form R T R x = (QR T b = R T Q T b. Since R is invertible, so is R T, so multiplication first by the inverse of R T, then by the inverse of R, yields x = R 1 Q T b, showing that there is only one least-squares solution to Ax = b. //