Orthonormal Transformations Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 25, 2010
Applications of transformation Q : R m R m, with Q T Q = I 1. solving linear equations (today) 2. solving least squares problems 3. solving eigenvalue problems 4. finding the singular value decomposition
Stability of orthogonal transformations We recall: Suppose Q is an orthonormal matrix, that is Q T Q = I. If x = x + e is an approximation to x then Qx = Qx + Qe and Qe 2 = e 2. If A R m,n and A = A + E is an approximation to A then QA = QA + QE and QE 2 = E 2. Conculsion: when an orthonormal transformation is applied to a vector or a matrix the error will not grow.
QR-Decomposition and Factorization Definition Let A C m,n with m n 1. We say that A = QR is a QR Decomposition of A if Q C m,m is square and unitary and [ ] R1 R = 0 m n,n where R 1 C n,n is upper triangular and 0 m n,n C m n,n is the zero matrix. We call A = QR a QR Factorization of A if Q C m,n has orthonormal columns and R C n,n is upper triangular.
QR decomp QR fact It is easy to construct a QR-factorization from a QR-decomposition A = QR. [ ] R1 We simply partition Q = [Q 1, Q 2 ] and R = where 0 Q 1 R m,n and R 1 R n,n. Then A = Q 1 R 1 is a QR-factorization of A. Conversely we construct the QR-decomposition from the QR-factorization A = Q 1 R 1 by extending the columns of Q 1 to an orthonormal basis Q = [Q 1, Q 2 ] for R m and defining R = [ R 1 0 ] R m,n. If m = n then a factorization is also a decomposition and vice versa.
An example of a QR-decomposition is 1 3 1 1 1 1 1 2 2 3 A = 1 3 7 1 1 4 = 1 1 1 1 1 2 1 1 1 1 0 4 5 0 0 6 = QR, 1 1 2 1 1 1 1 0 0 0 while a QR-factorization A = Q 1 R 1 is obtained by dropping the last column of Q and the last row of R so that 1 1 1 A = 1 1 1 1 2 2 3 2 1 1 1 0 4 5 = Q 1 R 1 0 0 6 1 1 1
Existence of QR Theorem Any matrix A R m,n with m n 1 has a QR decomposition and a QR factorization. The QR factorization is unique if R has positive diagonal elements.
Proof when A has linearly independent columns Existence A T A symmetric positive definite. Cholesky factorization A T A = R T 1 R 1, R 1 R n,n is upper triangular and nonsingular. Q 1 := AR 1 1 Q T 1 Q 1 = I. QR-factorization A = Q 1 R 1 Uniqueness A = Q 1 R 1 A T A = R T 1 R 1 Cholesky factorization unique implies R 1 unique and Q 1 = AR 1 1 unique.
Hadamard s inequality Theorem For any A = [a 1,..., a n ] C n,n we have det(a) n a j 2. (1) j=1 Equality holds if and only if A has a zero column or the columns of A are orthogonal. Example A = [ ] 2 1, det(a) = 3 a 1 2 1 2 a 2 2 = 5.
Proof Let A = QR be a QR factorization of A det(q) = 1 since 1 = det(i) = det(q Q) = det(q ) det(q) = det(q) 2 R = [r 1,..., r n ], (A A) jj = a j 2 2 = (R R) jj = r j 2 2, det(a) = det(qr) = det(r) = n j=1 r jj n j=1 r j 2 = n j=1 a j 2, If equality holds then either det(a) = 0 and A has a zero column, or det(a) 0 and r jj = r j 2 for j = 1,..., n, Then R is diagonal and A A = R R is diagonal The columns of A are orthogonal.
QR and Gram-Schmidt If A R m,n has rank n, then the set of columns {a 1,..., a n } forms a basis for span(a) and the Gram-Schmidt orthogonalization process takes the form v 1 = a 1, v j = a j j 1 i=1 a T j v i v T i v i v i, for j = 2,..., n. {v 1,..., v n } is an orthogonal basis for span(a). j 1 a 1 = v 1, a j = ρ ij v i + v j, where ρ ij = at j v i v T i v i i=1
GS=QR-factorization j 1 a 1 = v 1, a j = ρ ij v i + v j, A = VˆR, where V := [v 1,..., v n ] R m,n and ˆR is unit upper triangular. i=1 Since {v 1,..., v n } is a basis for span(a) the matrix D := diag( v 1 2,..., v n 2 ) is nonsingular, the matrix Q 1 := VD 1 v = [ v 1 2,..., n v n 2 ] is orthogonal. Therefore, A = Q 1 R 1, with R 1 := DˆR is a QR-factorization of A with positive diagonal entries in R 1. v 1
Finding QR Gram Schmidt is numerically unstable Use instead orthogonal transformations they are stable study Householder transformations and Givens rotations
Householder Transformations A matrix H R n,n of the form H := I uu T, where u R n and u T u = 2 is called a Householder transformation. For n = 2 we find H = [ 1 u 2 1 u 1 u 2 u 2 u 1 1 u 2 2 A Householder transformation is symmetric and orthonormal. In particular, H T H = H 2 = (I uu T )(I uu T ) = I 2uu T +u(u T u)u T = I. ].
Alternative representations 1. Householder used I 2uu T, where u T u = 1. 2. For any nonzero v R n the matrix H := I 2 vvt v T v is a Householder transformation. In fact H = I uu T, where u := 2 v v 2.
Transformation Lemma Suppose x, y R n with x 2 = y 2 and v := x y 0. Then ( I 2 vvt v T v) x = y. Proof. Since x T x = y T y we have v T v = (x y) T (x y) = 2x T x 2y T x = 2v T x. (2) But then ( ) I 2 vvt v T v x = x 2v T x v = x v = y. v T v
Geometric Interpretation Hx = y Hx is the mirror image (reflection) of x H = I 2vv T v T v = P vv T T vv, where P := I v T v v T v Px is the orthogonal projection of x into span{x + y} Px = x v T x v T v v (2) = x 1 2 v = 1 (x + y) 2 x+y y=hx Px x
Zero out entries x R n \ 0 and y = αe 1. If α 2 = x T x then x 2 = y 2. Two solutions α = + x 2 and α = x 2 Choose α to have opposite sign of x 1. obtain y x so H exists even if x = e 1. Numerical stability since we avoid cancelation in the subtraction v 1 = x 1 α.
Formulas for an algorithm Lemma For a nonzero vector x R n we define { x 2 if x 1 > 0 α := + x 2 otherwise, (3) and H := I uu T with u = x/α e 1 1 x1 /α. (4) Then H is a Householder transformation and Hx = αe 1.
Proof Let y := αe 1 and v = x y. By construction x T x = y T y and v 0. By Lemma 4 we have Hx = αe 1, where H = I 2 vvt v T v is a Householder transformation. Since 0 < v T v = (x αe 1 ) T (x αe 1 ) = x T x 2αx 1 +α 2 = 2α(α x 1 ), H = I 2(x αe 1)(x αe 1 ) T 2α(α x 1 ) = I (x/α e 1)(x/α e 1 ) T 1 x 1 /α = I uu T.
example For x := [1, 2, 2] T we have x 2 = 3 and since x 1 > 0 we choose α = 3. We find u = [2, 1, 1] T / 3 and H = I 1 2 1 [ 2 1 1 ] = 1 1 2 2 2 2 1. 3 3 1 2 1 2 We see that Hx = 3e 1.
What about x = 0? To simplify later algorithms we define a Householder transformation also for x = 0. x = 0 Ux = 0 for any U. Use u := 2e 1. I uu T is a Householder transformation.
Algorithm housegen Algorithm To given x R n the following algorithm computes a = α and the vector u so that (I uu T )x = αe 1. 1. function [u,a]=housegen(x) 2. a=norm(x); u=x; 3. if a==0 4. u(1)=sqrt(2); return; 5. end 6. if u(1)>0 7. a=-a; 8. end 9. u=u/a; u(1)=u(1)-1; 10. u=u/sqrt(-u(1));
Zero out only part of a vector Suppose y R k, z R n k and α 2 = z T z. Consider finding a Householder transformation H such that H [ y z ] = [ y αe 1 ]. Let [û, α] = housegen(z) and set u T = [0 T, û T ]. Then [ ] [ ] [ ] I 0 0û [0 ] I 0 H = I uu T = û = 0 I 0 Ĥ, where Ĥ = I ûût. Since u T u = û T û = 2 we see that H and Ĥ are Householder transformations.
Householder triangulation A R m,n with m > n. H n H n 1 H 1 A = [ R1 0 ], R 1 is upper triangular and each H k is a Householder transformation. [ ] R1 A = QR, where Q := H 1 H 2 H n and R := 0 A 1 = A [ ] Bk C A k = k 0 D k B k R k 1,k 1 is upper triangular and D k R n k+1,n k+1. zero out first column of D k under diagonal using H k.
Wilkinson Diagram 2 6 4 x x x x x x x x x x x x 3 2 H 7 1 5 6 4 r 11 r 12 r 13 0 x x 0 x x 0 x x 3 2 H 7 2 5 6 4 r 11 r 12 r 13 0 r 22 r 23 0 0 x 0 0 x 3 2 H 7 3 5 6 4 r 11 r 12 r 13 0 r 22 r 23 0 0 r 33 0 0 0 3 7 5.» B2 C A 1 = D 1 A 2 = 2 0 D 2» B3 C A 3 = 3 0 D 3 The transformation is applied to the lower right block. A 4 =» R1 0
m n H m 1 H 1 A = [R 1, S 1 ] = R, R 1 is upper triangular and S 1 R m,n m. If m = n then S 1 = [], the empty matrix.
Algorithm housetriang Algorithm Suppose A R m,n with m n and B R m,r. The algorithm uses housegen to compute Householder transformations H 1,..., H s, where s = min(n, m 1) such that R = H s...h 1 A is upper trapezoidal and C = H s...h 1 B. If B = I then C T R is the QR decomposition of A. If B is the empty matrix then C is the empty matrix with m rows and 0 columns. 1. function [R,C] = housetriang(a,b) 2. [m, n]=size(a); r=size(b,2); A=[A,B]; 3. for k=1:min(n,m-1) 4. [v,a(k,k)]=housegen(a(k:m,k)); 5. C=A(k:m,k+1:n+r); 6. A(k:m,k+1:n+r)=C-v*(v *C);
Solving linear systems using QR-factorization Ax = b and A = Q 1 R 1 R 1 x = Q T 1 b So Householder triangulation can be used as an alternative to Gaussian elimination. Let B R n,r. We give an algorithm using Householder transformations to solve r linear systems AX = B.
Algorithm housesolve Algorithm Suppose A R n,n and B = [b 1,..., b r ] R n,r. The algorithm uses housetriang to solve the r linear systems AX = B. 1. function X = housesolve(a,b) 2. [R,C]=housetriang(A,B); 3. opts.ut=true; %Use upper triangular solver 4. X=linsolve(R,C,opts);
Discussion Advantages with Householder: always works for nonsingular systems pivoting not necessary numerically stable Advantages with Gauss Half the number of flops compared to Householder. Pivoting often not necessary. Usually stable, (but no guarantee). In general better than Householder for banded matrices.
Rotations In some applications, the matrix we want to triangulate has a special structure. Suppose for example that A R n,n is square and upper Hessenberg as illustrated by a Wilkinson diagram for n = 4 x x x x A = x x x x 0 x x x. 0 0 x x Only one entry in each column needs to be annihilated and a full Householder transformation will be inefficient. In this case we can use a simpler transformation.
Givens Rotations Definition A plane rotation (also called a Given s rotation is an orthogonal matrix of the form [ c s P := s c ], where c 2 + s 2 = 1. There is a unique angle θ [0, 2π) such that c = cos θ and s = sin θ. x θ y=px Figure: Clockwise rotation.
Zero in one component Suppose x = [ x1 x 2 ] 0, c := x 1 r, s := x 2 r, r := x 2. Then Px = 1 r [ ] [ ] x1 x 2 x1 = 1 x 2 x 1 x 2 r [ ] x 2 1 + x2 2 = 0 [ ] r, 0 and we have introduced a zero in x. We can take P = I when x = 0.
Rotation in the i, j-plane For 1 i < j n we define a rotation in the i, j-plane as a matrix P ij = (p kl ) R n,n by p kl = δ kl except for positions ii, jj, ij, ji, which are given by [ ] [ ] pii p ij c s =, where c 2 + s 2 = 1. s c p ji p jj P 23 = [ 1 0 0 0 ] 0 c s 0 0 s c 0 0 0 0 1
Triangulating an upper Hessenberg matrix A = [ x x x x ] [ r11 r 12 r 13 r 14 x x x x P 12 0 x x x 0 x x x 0 x x x 0 0 x x 0 0 x x ] [ r11 r 12 r 13 r 14 ] [ r11 r 12 r 13 r 14 P 23 0 r 22 r 23 r 24 P 34 0 r 22 r 23 r 24 0 0 x x 0 0 x x [ 1 0 0 0 ] 0 c s 0 P 23 = 0 s c 0 0 0 0 1 Premultiplication by P 23 only affects rows 2, 3. 0 0 r 33 r 34 0 0 0 r 44 ].
Householder and Givens Householder reduction One column at a time Suitable for full matrices O(4n 3 /3) flops Givens reduction One entry at a time Suitable for banded and sparse matrices O(8n 3 /3) flops for a full matrix (2 times Householder # flops)