Math 407: Linear Optimization Lecture 16: The Linear Least Squares Problem II Math Dept, University of Washington February 28, 2018 Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 1 / 22
Linear Least Squares A linear least squares problem is one of the form LLS 1 minimize x R n 2 Ax b 2 2, where A R m n, b R m, and y 2 2 := y 2 1 + y 2 2 + + y 2 m. ecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 2 / 22
Linear Least Squares A linear least squares problem is one of the form LLS 1 minimize x R n 2 Ax b 2 2, where A R m n, b R m, and y 2 2 := y 2 1 + y 2 2 + + y 2 m. Theorem: Consider the linear least squares problem LLS. 1. A solution to the normal equations A T Ax = A T b always exists. 2. A solution to LLS always exists. 3. The linear least squares problem LLS has a unique solution if and only if Null(A) = {0} in which case (A T A) 1 exists and the unique solution is given by x = (A T A) 1 A T b. 4. If Ran(A) = R m, then (AA T ) 1 exists and x = A T (AA T ) 1 b solves LLS, indeed, Ax = b. ecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 2 / 22
Distance to a Subspace Let S R m be a subspace and suppose b R m is not in S. Find the point z S such that z b 2 z b 2 z S, or equivalently, solve D 1 min z S 2 z b 2 2. Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 3 / 22
Distance to a Subspace Let S R m be a subspace and suppose b R m is not in S. Find the point z S such that z b 2 z b 2 z S, or equivalently, solve D 1 min z S 2 z b 2 2. Least Squares Connection: Suppose S = Ran(A). Then z R m solves D if and only if there is an x R n with z = Ax such that x solves LLS. Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 3 / 22
Orthogonal Projections and Orthonormal Bases Let Q R n k be a matrix whose columns form an orthonormal basis for the subspace S, so that k = dim S. Set P = QQ T and note that Q T Q = I k the k k identity matrix. Then P 2 =QQ T QQ T =QI k Q T =QQ T =P and P T =(QQ T ) T =QQ T =P, so that P = P S the orthogonal projection onto S! ecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 4 / 22
Orthogonal Projections and Orthonormal Bases Let Q R n k be a matrix whose columns form an orthonormal basis for the subspace S, so that k = dim S. Set P = QQ T and note that Q T Q = I k the k k identity matrix. Then P 2 =QQ T QQ T =QI k Q T =QQ T =P and P T =(QQ T ) T =QQ T =P, so that P = P S the orthogonal projection onto S! Lemma: (1) The projection P R n n is orthogonal if and only if P = P T. (2) If the columns of the matrix Q R n k form an orthonormal basis for the subspace S R n, then P := QQ T is the orthogonal projection onto S. ecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 4 / 22
Orthogonal Projections and Distance to a Subspace Theorem: Let S R m be a subspace and let b R m \ S. Then the unique solution to the least distance problem D minimize z b 2 z S is z := P S b, where P S is the orthogonal projector onto S. ecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 5 / 22
Orthogonal Projections and Distance to a Subspace Theorem: Let S R m be a subspace and let b R m \ S. Then the unique solution to the least distance problem D minimize z b 2 z S is z := P S b, where P S is the orthogonal projector onto S. Proposition: Let A R m n with m n and Null(A) = {0}. Then the orthogonal projector onto Ran(A) is given by P Ran(A) = A(A T A) 1 A T. ecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 5 / 22
Minimal Norm Solutions to Ax = b Let A R m n and suppose that m < n. Then A is short and fat so A most likely has rank m, or equivalently, Ran(A) = R m. So for the purposes of this discussion we assume that rank(a) = m. Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 6 / 22
Minimal Norm Solutions to Ax = b Let A R m n and suppose that m < n. Then A is short and fat so A most likely has rank m, or equivalently, Ran(A) = R m. So for the purposes of this discussion we assume that rank(a) = m. Since m < n, the set of solutions to Ax = b will be infinite since the nullity of A is n m. Indeed, if x 0 is any particular solution to Ax = b, then the set of solutions is given by x 0 + Null(A) := { x 0 + z z Null(A) }. Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 6 / 22
Minimal Norm Solutions to Ax = b Let A R m n and suppose that m < n. Then A is short and fat so A most likely has rank m, or equivalently, Ran(A) = R m. So for the purposes of this discussion we assume that rank(a) = m. Since m < n, the set of solutions to Ax = b will be infinite since the nullity of A is n m. Indeed, if x 0 is any particular solution to Ax = b, then the set of solutions is given by x 0 + Null(A) := { x 0 + z z Null(A) }. In this setting, one might prefer the solution having least norm: 1 min 2 z + x 0 2. z Null(A) 2 This problem is of the form D min { z b 2 z S } whose solution is z = P S b when S is a subspace. Consequently, the solution is given by z = P S x 0 where P S is now the orthogonal projection onto S := Null(A). Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 6 / 22
A Formula for P Null(A) Observe that P Null(A) = P Ran(A T ) = I P Ran(A T ). ecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 7 / 22
A Formula for P Null(A) Observe that P Null(A) = P Ran(A T ) = I P Ran(A T ). We have already shown that if M R p q satisfies Null(M) = {0}, then P Ran(M) = M(M T M) 1 M T. If we take M = A T, then our assumption that Ran(A) = R m gives Null(M) = Null(A T ) = Ran(A) = (R m ) = {0}. Hence, P Ran(A T ) = A T (AA T ) 1 A and P Null(A) = P Ran(A T ) = I P Ran(A T ) = I A T (AA T ) 1 A. ecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 7 / 22
Solution to D min { z b 2 z S } We can take x 0 := A T (AA T ) 1 b as our particular solution to Ax = b in D since Ax 0 = AA T (AA T ) 1 b = b. ecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 8 / 22
Solution to D min { z b 2 z S } We can take x 0 := A T (AA T ) 1 b as our particular solution to Ax = b in D since Ax 0 = AA T (AA T ) 1 b = b. Hence, our solution to D is x = x 0 + P Null(A) ( x 0 ) = x 0 + (I A T (AA T ) 1 A)( x 0 ) = A T (AA T ) 1 Ax 0 = A T (AA T ) 1 AA T (AA T ) 1 b = A T (AA T ) 1 b = x 0. That is, x 0 = A T (AA T ) 1 b is the least norm solution to Ax = b. ecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 8 / 22
Solution to D min { z b 2 z S } Theorem: Let A R m n be such that m n and Ran(A) = R m. (1) The matrix AA T is invertible. (2) The orthogonal projection onto Null(A) is given by P Null(A) = I A T (AA T ) 1 A. (3) For every b R m, the system Ax = b is consistent, and the least norm solution to this system is uniquely given by x = A T (AA T ) 1 b. Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 9 / 22
Gram-Schmidt Orthogonalization and the QR-Factorization We now define the Gram-Schmidt orthogonalization process for a set of linearly independent vectors a 1,..., a n R m. (This implies that n < m (why?)) The orthogonal vectors q 1,..., q n are defined inductively, as follows: p 1 = a 1, q 1 = p 1 / p 1, j 1 p j = a j a j, q j q i and q j = p j / p j for 2 j n. i=1 For 1 j n, q j Span{a 1,..., a j }, so p j 0 by the linear independence of a 1,..., a j. An elementary induction argument shows that the q j s form an orthonormal basis for span (a 1,..., a n ). Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 10 / 22
Gram-Schmidt Orthogonalization and the QR-Factorization If we now define r jj = p j 0 and r ij = a j, q i for 1 i < j n, then a 1 = r 11 q 1, a 2 = r 12 q 1 + r 22 q 2, a 3 = r 13 q 1 + r 23 q 2 + r 33 q 3,. a n = n r in q i. i=1 Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 11 / 22
Gram-Schmidt Orthogonalization and the QR-Factorization Again suppose a 1,..., a n R m are linearly independent and set A = [a 1 a 2... a n ] R m n, R = [r ij ] R n n, and Q = [q 1 q 2... q n ] R m n, where r ij = 0, i > j. Then A = QR, where Q has orthonormal columns and R is an upper triangular n n matrix. In addition, R is invertible since the diagonal entries r jj are non-zero. This is called the QR factorization of the matrix A. Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 12 / 22
Full QR-Factorization Suppose A R m n with m n. Then there exists a permutation matrix P R n n, a unitary matrix Q R m m, and an upper triangular matrix R R m n such that AP = QR. Let Q 1 R m n denote the first n columns of Q, Q 2 the remaining (m n) columns of Q, and R 1 R n n the first n rows of R, then [ ] R1 AP = QR = [Q 1 Q 2 ] = Q 0 1 R 1. (1) Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 13 / 22
Full QR-Factorization Moreover, we have the following: (a) We may choose R to have nonnegative diagonal entries. (b) If A is of full rank, then we can choose R with positive diagonal entries, in which case we obtain the condensed factorization A = Q 1 R 1, where R 1 R n n invertible and the columns of Q 1 form an orthonormal basis for Ran(A). (c) If rank (A) = k < n, then R 1 = [ ] R11 R 12, 0 0 where R 11 is a k k invertible upper triangular matrix and R 12 R k (n k). In particular, this implies that AP = Q 11 [R 11 R 12 ], where Q 11 are the first k columns of Q. In this case, the columns of Q 11 form an orthonormal basis for the range of A and R 11 is invertible. Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 14 / 22
Condenced QR-Factorization Let A R m n have rank k min{m, n}. Then there exist Q R m k R R k n P R n n with orthonormal columns, full rank upper triangular, and a permutation matrix such that AP = QR. In particular, the columns of the matrix Q form a basis for the range of A. Moreover, the matrix R can be written in the form where R 1 R k k is nonsingular. R = [R 1 R 2 ], Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 15 / 22
Orthogonal Projections onto the Four Fundamental Subspaces Let A R m n have rank k min{m, n}. Let A and A T have generalized QR factorizations AP = Q[R 1 R 2 ] and A T P = Q[ R1 R2 ]. Since row rank equals column rank, P R n n is a permutation matrix, P R m m is a permutation matrix, Q R m k and Q R n k have orthonormal columns, R 1, R 1 R k k are both upper triangular nonsingular matrices, R 2 R k (n k), and R 2 R k (m k). Moreover, QQ T is the orthogonal projection onto Ran(A), I QQ T is the orthogonal projection onto Null(A T ), Q Q T is the orthogonal projection onto Ran(A T ), and I Q Q T is the orthogonal projection onto Null(A). Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 16 / 22
Solving the Normal Equations with the QR Factorization Let A R m n and b R m with k := rank (A) min{m, n}. The normal equations A T Ax = A T b yield solutions to the LLS problem min 1 2 Ax b 2 2. We show how the QR factorization of A is used to solve these equations. Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 17 / 22
Solving the Normal Equations with the QR Factorization Let A R m n and b R m with k := rank (A) min{m, n}. The normal equations A T Ax = A T b yield solutions to the LLS problem min 1 2 Ax b 2 2. We show how the QR factorization of A is used to solve these equations. Suppose A has condensed QR factorization AP = Q[R 1 R 2 ] where P R n n is a permutation matrix, Q R m k has orthonormal columns, R 1 R k k is nonsingular and upper triangular, and R 2 R k (n k) with k = rank (A) min{n, m}. Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 17 / 22
Solving the Normal Equations with the QR Factorization Since A = Q[R 1 R 2 ]P T the normal equations A T Ax = A T b become P [ R T 1 R T 2 ] Q T Q [ R 1 R 2 ] P T x = A T Ax = A T b = P [ R T 1 R T 2 ] Q T b, or equivalently since Q T Q = I k. [ ] [ ] R T [R1 P 1 ] R2 T R 2 P T R T x = P 1 R2 T Q T b. Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 18 / 22
Solving the Normal Equations with the QR Factorization Multiply [ ] [ ] R T [R1 P 1 ] R2 T R 2 P T R T x = P 1 R2 T Q T b on the left by P T to obtain [ ] [ ] R T [R1 1 ] R2 T R 2 P T R T x = 1 R2 T Q T b Defining w := [ R 1 R 2 ] P T x and setting ˆb := Q T b gives [ R T 1 R T 2 ] [ ] R T w = 1 ˆb. R2 T Take w = ˆb and define ˆx := P T x. Then [ R1 R 2 ] ˆx = ˆb. Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 19 / 22
Solving the Normal Equations with the QR Factorization We now need to solve the following equation for ˆx: [ R1 R 2 ] ˆx = ˆb. To do this write ) (ˆx1 ˆx = ˆx 2 with ˆx 1 R k. If we assume that ˆx 2 = 0, then R 1ˆx 1 = [ ] (ˆx ) R 1 R 1 2 0 which we can solve by taking ˆx 1 = R 1 1 ˆb = ˆb since R 1 is an invertible upper triangular k k matrix by construction. Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 20 / 22
Solving the Normal Equations with the QR Factorization Then Does this x solve the normal equations? ( ) R 1 x = P ˆx = P 1 ˆb. 0 ecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 21 / 22
Solving the Normal Equations with the QR Factorization Then ( ) R 1 x = P ˆx = P 1 ˆb. 0 Does this x solve the normal equations? [ ] A T Ax = A T R 1 AP 1 ˆb 0 = A T Q [ [ ] ] R 1 R 2 P T R 1 P 1 ˆb 0 = A T QR 1R 1 1 ˆb (since P T P = I ) = A T Q ˆb = A T QQ T b [ ] R T = P 1 Q T QQ T b R T 2 [ R T = P 1 R2 T = A T b! [ ] R (since A T T = P 1 R2 T Q T ) ] Q T b (since Q T Q = I ) ecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 21 / 22
Solving the Normal Equations with the QR Factorization This derivation yields the following recipe for solving the normal equations: AP = Q[R 1 R 2 ] the general condensed QR factorization o(m 2 n) ˆb = Q T b a matrix-vector product o(km) w 1 = R 1 1 ˆb a back solve o(k 2 ) [ ] R 1 x = P 1 ˆb 0 a matrix-vector product o(kn). Lecture 16: The Linear Least Squares Problem II (Math Dept, University Math 407: of Washington) Linear Optimization February 28, 2018 22 / 22