Least-Squares Systems and The QR factorization

Least-Squares Systems and The QR factorization Orthogonality Least-squares systems. The Gram-Schmidt and Modified Gram-Schmidt processes. The Householder QR and the Givens QR. Orthogonality The Gram-Schmidt algorithm. Two vectors u and v are orthogonal if (u, v) =.. A system of vectors {v,..., v n } is orthogonal if (v i, v j ) = for i j; and orthonormal if (v i, v j ) = δ ij 3. A matrix is orthogonal if its columns are orthonormal Notation: V = [v,..., v n ] == matrix with column-vectors v,..., v n. 8- TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-8- Least-Squares systems Given: an m n matrix n < m. Problem: find x which minimizes: b Ax Good illustration: Data fitting. Typical problem of data fitting: We seek an unknwon function as a linear combination φ of n known functions φ i (e.g. polynomials, trig. functions). Experimental data (not accurate) provides measures β,..., β m of this unknown function at points t,..., t m. Problem: find the best possible approximation φ to this data. Question: Close in what sense? Least-squares approximation: Find φ such that φ(t) = n i= ξ iφ i (t), & m j= φ(t j) β j = Min In linear algebra terms: find best approximation to a vector b from linear combinations of vectors f i, i =,..., n, where β φ i (t ) b = β., f i = φ i (t ). φ i (t m ) β m φ(t) = n i= ξ iφ i (t), s.t. φ(t j ) β j, j =,..., m 8-3 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-4 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-3 8-4

We want to find x = {ξ i } i=,...,n such that n ξ i f i b Minimum Define i= We want to find x to : F = [f, f,..., f n ], x = minimize b F x ξ. This is a Least-squares linear system: F is m n, with m > n. ξ n THEOREM. The vector x mininizes ψ(x) = b F x if and only if it is the solution of the normal equations: F T F x = F T b Proof: Expand out the formula for ψ(x + δx): ψ(x + δx) = ((b F x ) F δx) T ((b F x ) F δx) = ψ(x ) (F δx) T (b F x ) + (F δx) T (F δx) = ψ(x ) (δx) T [F T (b F x )] }{{} x ψ + (F δx) T (F δx) }{{} always positive In order for ψ(x + δx) ψ(x ) for any δx, the boxed quantity [the gradient vector] must be zero. Q.E.D. 8-5 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-6 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-5 8-6 b b F x * Example: Points: t = t = / t 3 = t 4 = / t 5 = Values: β =. β =.3 β 3 =.3 β 4 =. β 5 =..5 {F} F x *.4.3 Illustration of theorem: x is the best approximation to the vector b from the subspace span{f } if and only if b F x is to the whole subspace span{f }. This in turn is equivalent to F T (b F x ) = Normal equations.....5.5.5.5 8-7 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-8 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-7 8-8

) Approximations by polynomials of degree one: φ (t) =, φ (t) = t.....5 F =...5.. ( ) 5. F T F = (.5 ).9 F T b =.5 ) Approximation by polynomials of degree : φ (t) =, φ (t) = t, φ 3 (t) = t. Best polynomial found:.3857485.6 t.574857 t.5.4.5.3.4 Best approximation is φ(t) =.8.6t..3......5.5.5.5 8-9 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR..5.5.5.5 8- TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-9 8- Problem with Normal Equations Condition number is high: if A is square and non-singular, then κ (A) = A A = σ max /σ min κ (A T A) = A T A (A T A) = (σ max /σ min ) Example: Let A = ( ) ɛ ɛ. ɛ Then κ(a) /ɛ, but κ(a T A) ɛ. ( ) + ɛ fl(a T A) = fl + ɛ = + ɛ is singular to working precision (if ɛ < u ). ( ) Finding an orthonormal basis of a subspace Goal: Find vector in span(x) closest to b. Much easier with an orthonormal basis for span(x). Problem: Given X = [x,..., x n ], compute Q = [q,..., q n ] which has orthonormal columns and s.t. span(q) = span(x) Note: each column of X must be a linear combination of certain columns of Q. We will find Q so that x j (j column of X) is a linear combination of the first j columns of Q. 8- TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8- TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-8-

ALGORITHM : Classical Gram-Schmidt. For j =,..., n Do:. Set ˆq := x j 3. Compute r ij := (ˆq, q i ), for i =,..., j 4. For i =,..., j Do : 5. Compute ˆq := ˆq r ij q i 6. EndDo 7. Compute r jj := ˆq, 8. If r jj = then Stop, else q j := ˆq/r jj 9. EndDo All n steps can be completed iff x, x,..., x n are linearly independent. Lines 5 and 7-8 show that x j = r j q + r j q +... + r jj q j If X = [x, x,..., x n ], Q = [q, q,..., q n ], and if R is the n n upper triangular matrix R = {r ij } i,j=,...,n then the above relation can be written as X = QR R is upper triangular, Q is orthogonal. This is called the QR factorization of X. What is the cost of the factorization when X R m n? 8-3 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-4 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-3 8-4 X Original matrix = Q * Q is orthogonal H (Q Q = I) R R is upper triangular Another decomposition: A matrix X, with linearly independent columns, is the product of an orthogonal matrix Q and a upper triangular matrix R. 8-5 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR Better algorithm: Modified Gram-Schmidt. ALGORITHM : Modified Gram-Schmidt. For j =,..., n Do:. Define ˆq := x j 3. For i =,..., j, Do: 4. r ij := (ˆq, q i ) 5. ˆq := ˆq r ij q i 6. EndDo 7. Compute r jj := ˆq, 8. If r jj = then Stop, else q j := ˆq/r jj 9. EndDo Only difference: inner product uses the accumulated subsum instead of original ˆq 8-6 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-5 8-6

The operations in lines 4 and 5 can be written as ˆq := ORT H(ˆq, q i ) where ORT H(x, q) denotes the operation of orthogonalizing a vector x against a unit vector q. z x Modified Gram-Schmidt algorithm is much more stable than classical Gram-Schmidt in general. [A few examples easily show this]. Suppose MGS is applied to A yielding computed matrices ˆQ and ˆR. Then there are constants c i (depending on (m, n)) such that A + E = ˆQ ˆR E c u A ˆQ T ˆQ I c u κ (A) + O((u κ (A)) ) for a certain perturbation matrix E, and there exists an orthonormal matrix Q such that (x,q) q A + E = Q ˆR E (:, j) c 3 u A(:, j) for a certain perturbation matrix E. Result of z = ORT H(x, q) 8-7 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-8 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-7 8-8 An equivalent version: ALGORITHM : 3 Modified Gram-Schmidt - -. Set ˆQ := X. For i =,..., n Do:. Compute r ii := ˆq i, 3. If r ii = then Stop, else q i := ˆq i /r ii 4. For j = i +,..., n, Do: 5. r ij := (ˆq j, q i ) 6. ˆq j := ˆq j r ij q i 7. EndDo 8. EndDo Does exactly the same computation as previous algorithm, but in a different order. Example: Orthonormalize the system of vectors: X = [x, x, x 3 ] = 4 Answer: q = ; ˆq = x (x, q )q = ˆq = ; q = 8-9 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8- TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-9 8-

ˆq 3 = x 3 (x 3, q )q = = 4 3 ˆq 3 = ˆq 3 (ˆq 3, q )q = ( ) 3 ˆq 3 = 3 q 3 = ˆq 3 ˆq 3 = 3 =.5.5.5.5 8- TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR For this example: what is Q? what is R? Compute Q T Q. Result is the identity matrix. Recall: For any orthogonal matrix Q, we have Q T Q = I (In complex case: Q H Q = I). Consequence: For an n n orthogonal matrix Q = Q T. (Q is orthogonal/ unitary) 8- TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-8- Application: another method for solving linear systems. Ax = b Use of the QR factorization Problem: Ax b in least-squares sense A is an n n nonsingular matrix. Compute its QR factorization. Multiply both sides by Q T Q T QRx = Q T b Rx = Q T b Method: Compute the QR factorization of A, A = QR. Solve the upper triangular system Rx = Q T b. Cost?? A is an m n (full-rank) matrix. Let A = QR the QR factorization of A and consider the normal equations: A T Ax = A T b R T Q T QRx = R T Q T b R T Rx = R T Q T b Rx = Q T b (R T is an n n nonsingular matrix). Therefore, x = R Q T b 8-3 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-4 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-3 8-4

Another derivation: Recall: span(q) = span(a) So b Ax is minimum when b Ax span{q} Therefore solution x must satisfy Q T (b Ax) = Q T (b QRx) = Rx = Q T b x = R Q T b Also observe that for any vector w w = QQ T w + (I QQ T )w and that w = QQ T w (I QQ T )w Pythagoras theorem w = QQT w + (I QQT )w b Ax = b QRx = (I QQ T )b + Q(Q T b Rx) = (I QQ T )b + Q(Q T b Rx) = (I QQ T )b + Q T b Rx Min is reached when nd term of r.h.s. is zero. 8-5 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-6 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-5 8-6 Method: Compute the QR factorization of A, A = QR. Compute the right-hand side f = Q T b Solve the upper triangular system Rx = f. x is the least-squares solution As a rule it is not a good idea to form A T A and solve the normal equations. Methods using the QR factorization are better. Total cost?? (depends on the algorithm use to get the QR decomp. Using matlab find the parabola that fits the data in previous data fitting example (p. 8-) in L.S. sense [verify that the result found is correct.] 8-7 TB: 7,8,,9; AB:.,.3.4, ;GvL 5, 5.3 QR 8-7