Orthogonal Transformations Tom Lyche University of Oslo Norway Orthogonal Transformations p. 1/3
Applications of Qx with Q T Q = I 1. solving least squares problems (today) 2. solving linear equations (today) 3. solving eigenvalue problems (next time) 4. finding the singular value decomposition Orthogonal Transformations p. 2/3
Stability of orthogonal transformations We recall: Suppose Q is an orthogonal matrix, that is Q T Q = I. If x = x + e is an approximation to x then Qx = Qx + Qe and Qe 2 = e 2. If A R m,n and A = A + E is an approximation to A then QA = QA + QE and QE 2 = E 2. Conculsion: when an orthogonal transformation is applied to a vector or a matrix the error will not grow. Orthogonal Transformations p. 3/3
QR-Decomposition and Factorization Definition 1. Let A R m,n with m n 1. We say that A = QR is a QR-decomposition of A if Q R m,m is square and orthogonal and R = R 1 0 with R 1 R n,n upper triangular and 0 R m n,n the zero matrix. We say that A = QR is a QR-factorization of A if Q R m,n has orthonormal columns and R = R 1 R n,n is upper triangular. Orthogonal Transformations p. 4/3
QR decomp QR fact It is easy to construct a QR-factorization from a QR-decomposition A = QR. We simply partition Q = [Q 1, Q 2 ] and R = and R 1 R n,n. R 1 0 where Q 1 R m,n Then A = Q 1 R 1 is a QR-factorization of A. Conversely we construct the QR-decomposition from the QR-factorization A = Q 1 R 1 by extending the columns of Q 1 to an orthonormal basis Q = [Q 1, Q 2 ] for R m and defining R = [ R 1 0 ] R m,n. If m = n then the two factorizations are the same. Orthogonal Transformations p. 5/3
example An example of a QR-decomposition is A = 1 3 1 1 3 7 1 1 4 = 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 3 0 4 5 0 0 6 = QR, 0 0 0 while a QR-factorization A = Q 1 R 1 is obtained by dropping the last column of Q and the last row of R so that A = 1 2 1 1 1 1 1 1 2 2 3 1 1 1 0 4 5 = Q 1 R 1 0 0 6 1 1 1 Orthogonal Transformations p. 6/3
Existence of QR Factorization Lemma 2. A matrix A R m,n with linearly independent columns has a QR-decomposition and a QR-factorization. The QR-factorization is unique if R has positive diagonal entries. Proof. Existence Cholesky factorization A T A = R T 1 R 1, R 1 R n,n is upper triangular and nonsingular. Q 1 := AR 1 1 Q T 1 Q 1 = I. QR-factorization A = Q 1 R 1 Uniqueness A = Q 1 R 1 A T A = R T 1 R 1 Cholesky factorization unique implies R 1 unique and Q 1 = AR 1 1 unique. Orthogonal Transformations p. 7/3
QR and Gram-Schmidt If A R m,n has rank n, then the set of columns {a 1,...,a n } forms a basis for span(a) and the Gram-Schmidt orthogonalization process takes the form v 1 = a 1, v j = a j j 1 i=1 a T j v i v T i v v i, for j = 2,...,n. i {v 1,...,v n } is an orthogonal basis for span(a). a 1 = v 1, a j = j 1 i=1 ρ ij v i + v j, where ρ ij = at j v i v T i v i Orthogonal Transformations p. 8/3
GS=QR-factorization a 1 = v 1, a j = j 1 i=1 ρ ij v i + v j, A = V ˆR, where V := [v 1,...,v n ] R m,n and ˆR is unit upper triangular. Since {v 1,...,v n } is a basis for span(a) the matrix D := diag( v 1 2,..., v n 2 ) is nonsingular, the matrix Q 1 := V D 1 = [ v 1 v 1 2,..., v n v n 2 ] is orthogonal. Therefore, A = Q 1 R 1, with R 1 := D ˆR is a QR-factorization of A with positive diagonal entries in R 1. Orthogonal Transformations p. 9/3
QR and Least Squares Suppose A R m,n has rank n and let b R m. Consider the least squares problem min x R n Ax b 2. Suppose A = QR is a QR-decomposition of A. We partition Q and R as Q = [Q 1 Q 2 ] and R = [ ] R 1 0, where Q 1 R m,n and R 1 R n,n. [ ] Ax b 2 2 = QRx b 2 2 = Rx Q T b 2 R 1 x Q T 1 b 2 = Q T 2 b = R 1 x Q T 1 b 2 2 + QT 2 b 2 2. Thus Ax b 2 Q T 2 b 2 for all x R n with equality if R 1 x = Q T 1 b. 2 2 Orthogonal Transformations p. 10/3
Least Squares using QR 1. Find a QR-factorization A = Q 1 R 1 of A. 2. Solve R 1 x = Q T 1 b for the least squares solution x. Consider the least squares problem with [ 1 3 1 ] 1 3 7 A = 1 1 4 and b = 1 1 2 [ 11 1 1 ]. The least squares solution x is found by solving the system [ 2 2 3 ] [ x1 ] 0 4 5 2 = 1 [ 1 1 1 1 ] [ 11 ] [ 20 ] 1 1 1 1 0 0 6 x 3 2 1 1 1 1 1 = 1 0 and we find x = [1, 0, 0] T. Orthogonal Transformations p. 11/3
Finding QR Gram Schmidt is numerically unstable Use instead orthogonal transformations they are stable study Householder transformations and Givens rotations Orthogonal Transformations p. 12/3
Householder Transformations A matrix H R n,n of the form H := I uu T, where u R n and u T u = 2 is called a Householder transformation. For n = 2 we find H = [ 1 u 2 1 u 1u 2 u 2 u 1 1 u 2 2 A Householder transformation is symmetric and orthogonal. In particular, H T H = H 2 = (I uu T )(I uu T ) = I 2uu T + u(u T u)u T = I. ]. Orthogonal Transformations p. 13/3
Alternative representations Householder himself used I 2uu T, where u T u = 1. For any nonzero v R n the matrix H := I 2 vvt v T v is a Householder transformation. In fact H = I uu T, where u := 2 v v 2. Orthogonal Transformations p. 14/3
Transformation Lemma 3. Suppose x,y R n with x 2 = y 2 and v := x y 0. Then ( I 2 vvt v T v) x = y. Proof. Since x T x = y T y we have v T v = (x y) T (x y) = 2x T x 2y T x = 2v T x. (1) ( ) But then I 2 vvt v T v x = x 2v T x v T v v = x v = y. Orthogonal Transformations p. 15/3
Geometric Interpretation Hx = y Hx is the mirror image (reflection) of x H = I 2vvT v T v = P vvt vvt, where P := I v T v v T v Px is the orthogonal projection of x into span{x + y} Px = x vt x v T v v = (1) x 1 2 v = 1 (x + y) 2 x+y y=hx Px x Orthogonal Transformations p. 16/3
Zero out entries x R n \ 0 and y = αe 1. If α 2 = x T x then x 2 = y 2. Two solutions α = + x 2 and α = x 2 Choose α to have opposite sign of x 1. obtain y x so H exists even if x = e 1. Numerical stability since we avoid cancelation in the subtraction v 1 = x 1 α. Orthogonal Transformations p. 17/3
Formulas for an algorithm Lemma 1. For a nonzero vector x R n we define { x 2 if x 1 > 0 α := + x 2 otherwise, (2) and H := I uu T with u = x/α e 1 1 x1 /α. (3) Then H is a Householder transformation and Hx = αe 1. Orthogonal Transformations p. 18/3
Proof Let y := αe 1 and v = x y. By construction x T x = y T y and v 0. By Lemma 3 we have Hx = αe 1, where H = I 2 vvt v T v is a Householder transformation. Since 0 < v T v = (x αe 1 ) T (x αe 1 ) = x T x 2αx 1 +α 2 = 2α(α x 1 ), we find H = I 2(x αe 1)(x αe 1 ) T 2α(α x 1 ) = I (x/α e 1)(x/α e 1 ) T 1 x 1 /α = I uu T. Orthogonal Transformations p. 19/3
example For x := [1, 2, 2] T we have x 2 = 3 and since x 1 > 0 we choose α = 3. We find u = [2, 1, 1] T / 3 and H = I 1 2 [ ] 1 2 1 1 = 1 1 2 2 2 2 1. 3 3 1 2 1 2 We see that Hx = 3e 1. Orthogonal Transformations p. 20/3
Algorithm housegen To given x R n the following algorithm computes a = α and the vector u so that (I uu T )x = αe 1. function [ u, a ]= housegen ( x ) a=norm( x ) ; u=x ; i f a==0 u (1)= sqrt ( 2 ) ; return ; end i f u(1)>0 a= a ; end u=u / a ; u (1)=u(1) 1; u=u / sqrt( u ( 1 ) ) ; Orthogonal Transformations p. 21/3
Zero only part of a vector Suppose y R k, z R n k and α 2 = z T z. Consider finding a Householder transformation H such that H [ y z ] = [ y αe 1 ]. Let [û, α] = housegen(z) and set u T = [0 T, û T ]. Then H = I uu T = I 0 0 I 0 [ ] 0 û û = I 0, 0 Ĥ where Ĥ = I ûût. Since u T u = û T û = 2 we see that H and transformations. Ĥ are Householder Orthogonal Transformations p. 22/3
Householder Triangulation x x x x x x x x x x x x H 1 r 11 r 12 r 13 0 x x 0 x x 0 x x H 2 r 11 r 12 r 13 0 r 22 r 23 0 0 x 0 0 x A 1 = D 1 A 2 = B 2 C 2 A 3 = B 3 C 3 A 4 = 0 D 2 0 D 3 H 3 r 11 r 12 r 13 0 r 22 r 23 0 0 r 33 0 0 0 R 1 0 The transformation is applied to the lower right block. A = QR, where Q := H 1 H 2 H n and R := R 1 0. Orthogonal Transformations p. 23/3
Algorithm houselsq Suppose A R m,n has rank n and that b R m. This algorithm uses Householder transformations to solve the least squares problem min x Ax b 2 if m > n and the linear system Ax = b if m = n. It uses housegen and backsolve. function x = houselsq (A, b ) [m, n ]= size (A ) ; A=[A b ] ; for k =1:min ( n,m 1) [ v, A( k, k ) ] = housegen (A( k :m, k ) ) ; A( k :m, k +1: n+1)=a( k :m, k +1: n+1) v (v A( k :m, k +1: n + 1 ) ) ; end x=backsolve (A( 1 : n, 1 : n ),A ( 1 : n, n + 1 ) ) ; Orthogonal Transformations p. 24/3
QR without full rank Theorem 2. Suppose m n 1 and A R m,n. Then A has a QR-decomposition and a QR-factorization. Orthogonal Transformations p. 25/3
Compare with normal equations A = QR and R 1 x = Q T 1 b compared to A T Ax = A T b Flops: Householder 2mn 2 2n 3 /3, Normal eqn mn 2. Householder twice as expensive for large m Householder is numerically stable Normal equations often ill-conditioned Orthogonal Transformations p. 26/3
Squaring the condition number Recall that the 2-norm condition number of a matrix is the square root of the largest eigenvalue of A T A. It follows that the 2-norm condition number of A T A is the square of the 2-norm condition number of A. Thus if A is mildly ill-conditioned the normal equations can be quite ill-conditioned solving the normal equations can give inaccurate results. Orthogonal Transformations p. 27/3
What about linear systems? houselsq can be used to solve linear systems. If A is nonsingular and m = n then the output x will be the solution of Ax = b. This follows since the QR-decomposition and QR-factorization are the same when A is square. Therefore, if Rx = Q T b then Ax b 2 = QRx b 2 = Rx Q T b 2 = 0. So Algorithm 24 can be used as an alternative to Gaussian elimination. The two methods are similar since they both reduce A to upper triangular form using certain transformations. Orthogonal Transformations p. 28/3
Which method is better? Linear systems can be constructed where Gaussian elimination with partial pivoting will fail numerically. On the other hand the transformations used in Householder triangulation are orthogonal so the method is quite stable. So why is Gaussian elimination more popular than Householder triangulation? One reason is that the number of flops when m = n is given by 4n 3 /3, while the count for Gaussian elimination is half of that. Numerical stability can be a problem with Gaussian elimination, but years and years of experience shows that it works well for most practical problems and pivoting is often not necessary. Tradition might also play a role. Orthogonal Transformations p. 29/3
Rotations In some applications, the matrix we want to triangulate has a special structure. Suppose for example that A R n,n is square and upper Hessenberg as illustrated by a Wilkinson diagram for n = 4 x x x x x x x x A = 0 x x x. 0 0 x x Only one entry in each column needs to be annihilated and a full Householder transformation will be inefficient. In this case we can use a simpler transformation. Orthogonal Transformations p. 30/3
Givens Rotations Definition 3. A plane rotation (also called a Given s rotation) is an orthogonal matrix of the form P := [ ] c s, where c 2 + s 2 = 1. s c There is a unique angle θ [0, 2π) such that c = cos θ and s = sinθ. x θ y=px Figure 1: Clockwise rotation. Orthogonal Transformations p. 31/3
Zero in one component Suppose x = [ x 1 x 2 ] 0, c := x 1 r, s := x 2 r, r := x 2. Then Px = 1 r [ x 1 x 2 x 2 x 1 ] [ x 1 x 2 ] = 1 r [ x 2 1 + x2 2 0 ] = [ ] r 0, and we have introduced a zero in x. We can take P = I when x = 0. Orthogonal Transformations p. 32/3
Rotation in the i, j-plane For 1 i < j n we define a rotation in the i,j-plane as a matrix P ij = (p kl ) R n,n by p kl = δ kl except for positions ii,jj,ij,ji, which are given by [ ] [ ] p ii p ij c s =, where c 2 + s 2 = 1. s c p ji p jj P 23 = [ 1 0 0 0 ] 0 c s 0 0 s c 0 0 0 0 1 Orthogonal Transformations p. 33/3
Triangulating an upper Hessenberg ma A = [ ] x x x x 0 x x x 0 0 x x P 12 [ r11 r 12 r 13 r 14 0 x x x 0 x x x 0 0 x x ] P 23 [ r11 r 12 r 13 r 14 0 r 22 r 23 r 24 0 0 x x 0 0 x x ] P 34 [ r11 r 12 r 13 0 r 22 r 23 0 0 r 33 0 0 0 Orthogonal Transformations p. 34/3