Linear Analysis Lecture 16
The QR Factorization Recall the Gram-Schmidt orthogonalization process. Let V be an inner product space, and suppose a 1,..., a n V are linearly independent. Define q 1,..., q n inductively, as follows: set For 1 j n, p 1 = a 1, q 1 = p 1 / p 1, j 1 p j = a j a j, q i q i for 2 j n, and i=1 q j = p j / p j. q j Span{a 1,..., a j }, so each p j 0 by the lin. indep. of {a 1,..., a n }. Thus each q j is well-defined. We have {q 1,..., q n } is an orthonormal basis for Span{a 1,..., a n }. Also a k Span{q 1,..., q k } 1 k n, so {q 1,..., q k } is an orthonormal basis of Span{a 1,..., a k }. 2 / 14
The QR Factorization Define r jj = p j and r ij = a j, q i for 1 i < j n, we have: a 1 = r 11 q 1, a 2 = r 12 q 1 + r 22 q 2, a 3 = r 13 q 1 + r 23 q 2 + r 33 q 3,. a n = n r in q i. i=1 Set A = [a 1 a 2... a n ], R = [r ij ], and Q = [q 1 q 2... q n ], where r ij = 0, i > j. Then A = QR, where Q is unitary and R is upper triangular. 3 / 14
The QR Factorization: Remarks (1) If a 1, a 2, is a lin. indep. sequence, apply Gram-Schmidt to obtain an orthonormal sequence q 1, q 2,... such that {q 1,..., q k } is an orthonormal basis for Span{a 1,..., a k }, k 1. (2) If the a j s are lin. dep., for some value(s) of k, a k Span{a 1,..., a k 1 }, so p k = 0. The process can be modified by setting r kk = 0, not defining a new q k for this iteration and then proceeding as usual. We end up with orthogonal q j s. Then for k 1, the vectors {q 1,..., q k } form an orthonormal basis for Span{a 1,..., a l+k } where l is the number of r jj = 0. Again we obtain A = QR, but now Q may not be square. 4 / 14
The QR Factorization: Remarks (3) The classical Gram-Schmidt algorithm has poor computational performance due to the accumulation of round-off error. The computed q j s are not orthogonal: q j, q k is small for j k and k j small, but not for k j big. Classic Gram-Schmidt Modified Gram-Schmidt For j = 1,, n do For j = 1,..., n do p := a j p := aj For i = 1,..., j 1 do For i = 1,..., j 1 do r ij = a j, q i r ij = p, q i p := p r ij q i p := p r ij q i r jj := p r jj = p q j := p/r jj q j := p/r jj The only difference is in the computation of r ij, we orthogonalize the accumulated partial sum for p j against each q i successively. 5 / 14
The QR Factorization Proposition. Suppose A C m n with m n. Then unitary Q C m m upper triangular R C m n for which A = QR. If Q C m n denotes the first n columns of Q and R C n n denotes the first n rows of R, then [ ] R A = QR = [ Q ] = Q R. 0 Moreover (a) We may choose an R to have nonnegative diagonal entries. (b) If A is of full rank, then we can choose R with positive diagonal entries, in which case the condensed factorization A = Q R is unique (c) If A is of full rank, the condensed factorization A = Q R is essentially unique: if A = Q 1 R1 = Q 2 R2, then there is a unitary diagonal matrix D C n n for which Q 2 = Q 1 D H, rescaling the columns of Q 1, and R 2 = D R 1, rescaling the rows of R 1. 6 / 14
The QR Factorization: Proof If A has full rank, apply the Gram-Schmidt. Define Q = [q 1,..., q n ] C m n and R = [rij ] C n n as above, so A = Q R. Extend {q 1,..., q n } to an orthonormal basis {q 1,..., q m } of C m, and set Q = [q 1,..., q m ] and R = [ R 0 ] C m n, so A = QR. As r jj > 0 in the G-S process, we have (b). Uniqueness follows by induction passing through the G-S process again, noting that at each step we have no choice. (c) follows easily from (b). 7 / 14
The QR Factorization: Remarks (1) If A R m n, everything can be done in real arithmetic, so, e.g., Q R m m is orthogonal and R R m n is real, upper triangular. (2) In practice, there are more efficient and better computationally behaved ways of calculating the Q and R factors. The idea is to create zeros below the diagonal (successively in columns 1, 2,...) as in Gaussian Elimination, except we now use Householder transformations (which are unitary) instead of the unit lower triangular matrices L j. (3) A QR factorization is also possible when m < n. A = Q[R 1 R 2 ], where Q C m m is unitary and R 1 C m m is upper triangular. 8 / 14
The QR Factorization Every A C m n has a QR-factorization, even when m < n. Indeed, if there always exist rank (A) = k, Q C m k with orthonormal columns, R C k n full rank upper triangular, and a permutation matrix P R n n such that ( ) AP = QR. Moreover, if A has rank n (so m n), then R C n n is nonsingular. On the other hand, if m < n, then [ ] R1 R R = 2, 0 0 where R 1 C k k is nonsingular. Finally, if A R m n, then the same facts hold, but now both Q and R can be chosen to be real matrices. 9 / 14
The QR-Factorization and Orthogonal Projections Let A C m n have condensed QR-factorization A = Q R. Then by construction the columns of Q form an orthonormal basis for H the range of A. Hence P = Q Q is the orthogonal projector onto the range of A. Similarly, if the condensed QR-factorization of A H is A H = Q 1 R1, then P 1 = Q 1 QH 1 is the orthogonal projector onto ran(a H ) = ker(a), and so I Q 1 QH 1 is the orthogonal projector onto ker(a). The QR-factorization can be computed using either Givens rotations or Householder reflections. Although, the approach via rotations is arguably more stable numerically, it is more difficult to describe so we only illustrate the approach using Householder reflections. 10 / 14
QR using Householder Reflections Recall that to any w C n we can associate the Householder reflection U = I 2 ww w w which reflects C n about the hyperplane Span{w}. Given a pair of non-zero vectors x and y with x 2 = y 2, and x y, there is a Householder reflection such that y = Ux: Proof: (x y)(x y) U = I 2 (x y) (x y). x 2 y x Ux = x 2(x y) x 2 2y x + y 2 since x = y. = x 2(x y) x 2 y x 2( x 2 y x) = y 11 / 14
QR using Householder Reflections We now [ describe ] the basic deflation ( ) step in the QR-factorization Suppose α0 a0 A 0 = T and set ν b 0 A 0 = α0 0. Let H b 0 be the Householder 0 2 transformation that maps ( ) ( ) ( ) α0 b0 T ν 0 e 1 [: H 0 = ] I 2 wwt w T w where w = α0 ( α0 ν ν b 0 ) e 1 = 0. ν0 a1 0 b Thus, H 0 0 A = T α0. A problem occurs if ν 0 A 0 = 0 or = 0. In 1 b 0 this case, permute the offending column to the right bringing in the column of greatest magnitude. Now repeat with A 1. If this method is implemented by always permuting the column of greatest magnitude into the current pivot column, then AP = QR gives a QR-factorization with the diagonal entries of R nonnegative and listed in the order of descending magnitude. 12 / 14
QR for Solving Least Squares Problems Suppose A C m n, b C m, and m n with rank (A) = n. Consider the least squares problem min b Ax 2. Let A = QR be a QR factorization of A, with condensed form Q 1 R, so that R C n n is nonsingular. Write Q = [Q 1 Q 2 ] where Q 2 C m (m n). Then b Ax 2 = b QRx 2 = Q H b Rx 2 [ ] [ ] = Q H 1 R b x Q2 H 0 [ = Q H 1 b Rx ] 2 Q2 H b = Q H 1 b Rx 2 + Q H 2 b 2. Here R C n n is an invertible upper triangle matrix, so 2 x solves min x C n b Ax 2 Rx = Q H 1 b. This system can be solved by back-substitution. Note that we only need Q 1 and R to solve for x. 13 / 14
QR for Solving Least Squares Problems Next suppose A C m n, b C m, and m < n with rank (A) = m, so that there are many solutions to Ax = b. Consider the least squares problem min b Ax 2. In this case A is surjective, so there exists x C n such that A x = b. Let A H = QR be a QR factorization of A H, with condensed form A = Q R, so that R C n n is nonsingular and Q H Q = Im m. Now solve Ax = R H QH x = b. To do this first solve R H y = b by forward substitution ( R H is nonsingular lower triangular), and set x = Qy. Then A x = R H QH x = R H QH Qy = RH y = b. Also, x is the least norm solution to Ax = b! (Why?) 14 / 14