Orthonormal Transformations and Least Squares

Orthonormal Transformations and Least Squares Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 30, 2009

Applications of Qx with Q T Q = I 1. solving linear equations (today) 2. solving least squares problems (today) 3. solving eigenvalue problems (next time) 4. finding the singular value decomposition

Stability of orthogonal transformations We recall: Suppose Q is an orthonormal matrix, that is Q T Q = I. If x = x + e is an approximation to x then Qx = Qx + Qe and Qe 2 = e 2. If A R m,n and A = A + E is an approximation to A then QA = QA + QE and QE 2 = E 2. Conculsion: when an orthonormal transformation is applied to a vector or a matrix the error will not grow.

QR-Decomposition and Factorization Definition Let A C m,n with m n 1. We say that A = QR is a QR Decomposition of A if Q C m,m is square and unitary and [ ] R1 R = 0 m n,n where R 1 C n,n is upper triangular and 0 m n,n C m n,n is the zero matrix. We call A = QR a QR Factorization of A if Q C m,n has orthonormal columns and R C n,n is upper triangular.

QR decomp QR fact It is easy to construct a QR-factorization from a QR-decomposition A = QR. [ ] R1 We simply partition Q = [Q 1, Q 2 ] and R = where 0 Q 1 R m,n and R 1 R n,n. Then A = Q 1 R 1 is a QR-factorization of A. Conversely we construct the QR-decomposition from the QR-factorization A = Q 1 R 1 by extending the columns of Q 1 to an orthonormal basis Q = [Q 1, Q 2 ] for R m and defining R = [ R 1 0 ] R m,n. If m = n then a factorization is also a decomposition and vice versa.

An example of a QR-decomposition is 1 3 1 1 1 1 1 2 2 3 A = 1 3 7 1 1 4 = 1 1 1 1 1 2 1 1 1 1 0 4 5 0 0 6 = QR, 1 1 2 1 1 1 1 0 0 0 while a QR-factorization A = Q 1 R 1 is obtained by dropping the last column of Q and the last row of R so that 1 1 1 A = 1 1 1 1 2 2 3 2 1 1 1 0 4 5 = Q 1 R 1 0 0 6 1 1 1

Existence of QR Factorization Theorem Suppose A C m,n with m n 1 has linearly independent columns. Then A has a QR decomposition and a QR factorization. The QR factorization is unique if R has positive diagonal elements.

Proof Existence A T A symmetric positive definite. Cholesky factorization A T A = R T 1 R 1, R 1 R n,n is upper triangular and nonsingular. Q 1 := AR 1 1 Q T 1 Q 1 = I. QR-factorization A = Q 1 R 1 Uniqueness A = Q 1 R 1 A T A = R T 1 R 1 Cholesky factorization unique implies R 1 unique and Q 1 = AR 1 1 unique.

Hadamard s inequality Theorem (Hadamard s Inequality) For any A = [a 1,..., a n ] C n,n we have det(a) n a j 2. (1) j=1 Equality holds if and only if A has a zero column or the columns of A are orthogonal. Example A = [ ] 2 1, det(a) = 3 a 1 2 1 2 a 2 2 = 5.

Proof Let A = QR be a QR factorization of A 1 = det(i) = det(q H Q) = det(q H ) det(q) = det(q) 2 det(q) = 1, R = [r 1,..., r n ], (A H A) jj = a j 2 2 = (R H R) jj = r j 2 2, det(a) = det(qr) = det(r) = n j=1 r jj n j=1 r j 2 = n j=1 a j 2, If equality holds then either det(a) = 0 and A has a zero column, or det(a) 0 and r jj = r j 2 for j = 1,..., n, Then R is diagonal and A H A = R H R is diagonal The columns of A are orthogonal.

QR and Gram-Schmidt If A R m,n has rank n, then the set of columns {a 1,..., a n } forms a basis for span(a) and the Gram-Schmidt orthogonalization process takes the form v 1 = a 1, v j = a j j 1 i=1 a T j v i v T i v i v i, for j = 2,..., n. {v 1,..., v n } is an orthogonal basis for span(a). j 1 a 1 = v 1, a j = ρ ij v i + v j, where ρ ij = at j v i v T i v i i=1

GS=QR-factorization j 1 a 1 = v 1, a j = ρ ij v i + v j, A = VˆR, where V := [v 1,..., v n ] R m,n and ˆR is unit upper triangular. i=1 Since {v 1,..., v n } is a basis for span(a) the matrix D := diag( v 1 2,..., v n 2 ) is nonsingular, the matrix Q 1 := VD 1 v = [ v 1 2,..., n v n 2 ] is orthogonal. Therefore, A = Q 1 R 1, with R 1 := DˆR is a QR-factorization of A with positive diagonal entries in R 1. v 1

Finding QR Gram Schmidt is numerically unstable Use instead orthogonal transformations they are stable study Householder transformations and Givens rotations

Householder Transformations A matrix H R n,n of the form H := I uu T, where u R n and u T u = 2 is called a Householder transformation. For n = 2 we find H = [ 1 u 2 1 u 1 u 2 u 2 u 1 1 u 2 2 A Householder transformation is symmetric and orthonormal. In particular, H T H = H 2 = (I uu T )(I uu T ) = I 2uu T +u(u T u)u T = I. ].

Alternative representations 1. Householder used I 2uu T, where u T u = 1. 2. For any nonzero v R n the matrix H := I 2 vvt v T v is a Householder transformation. In fact H = I uu T, where u := 2 v v 2.

Transformation Lemma Suppose x, y R n with x 2 = y 2 and v := x y 0. Then ( I 2 vvt v T v) x = y. Proof. Since x T x = y T y we have v T v = (x y) T (x y) = 2x T x 2y T x = 2v T x. (2) But then ( ) I 2 vvt v T v x = x 2v T x v = x v = y. v T v

Geometric Interpretation Hx = y Hx is the mirror image (reflection) of x H = I 2vv T v T v = P vv T T vv, where P := I v T v v T v Px is the orthogonal projection of x into span{x + y} Px = x v T x v T v v (2) = x 1 2 v = 1 (x + y) 2 x+y y=hx Px x

Zero out entries x R n \ 0 and y = αe 1. If α 2 = x T x then x 2 = y 2. Two solutions α = + x 2 and α = x 2 Choose α to have opposite sign of x 1. obtain y x so H exists even if x = e 1. Numerical stability since we avoid cancelation in the subtraction v 1 = x 1 α.

Formulas for an algorithm Lemma For a nonzero vector x R n we define { x 2 if x 1 > 0 α := + x 2 otherwise, (3) and H := I uu T with u = x/α e 1 1 x1 /α. (4) Then H is a Householder transformation and Hx = αe 1.

Proof Let y := αe 1 and v = x y. By construction x T x = y T y and v 0. By Lemma 4 we have Hx = αe 1, where H = I 2 vvt v T v is a Householder transformation. Since 0 < v T v = (x αe 1 ) T (x αe 1 ) = x T x 2αx 1 +α 2 = 2α(α x 1 ), H = I 2(x αe 1)(x αe 1 ) T 2α(α x 1 ) = I (x/α e 1)(x/α e 1 ) T 1 x 1 /α = I uu T.

example For x := [1, 2, 2] T we have x 2 = 3 and since x 1 > 0 we choose α = 3. We find u = [2, 1, 1] T / 3 and H = I 1 2 1 [ 2 1 1 ] = 1 1 2 2 2 2 1. 3 3 1 2 1 2 We see that Hx = 3e 1.

What about x = 0? To simplify later algorithms we define a Householder transformation also for x = 0. x = 0 Ux = 0 for any U. Use u := 2e 1. I uu T is a Householder transformation.

Algorithm housegen To given x R n the following algorithm computes a = α and the vector u so that (I uu T )x = αe 1. 1. function [u,a]=housegen(x) 2. a=norm(x); u=x; 3. if a==0 4. u(1)=sqrt(2); return; 5. end 6. if u(1)>0 7. a=-a; 8. end 9. u=u/a; u(1)=u(1)-1; 10. u=u/sqrt(-u(1));

Zero out only part of a vector Suppose y R k, z R n k and α 2 = z T z. Consider finding a Householder transformation H such that H [ y z ] = [ y αe 1 ]. Let [û, α] = housegen(z) and set u T = [0 T, û T ]. Then [ ] [ ] [ ] I 0 0û [0 ] I 0 H = I uu T = û = 0 I 0 Ĥ, where Ĥ = I ûût. Since u T u = û T û = 2 we see that H and Ĥ are Householder transformations.

Householder triangulation A R m,n with m > n. H n H n 1 H 1 A = [ R1 0 ], R 1 is upper triangular and each H k is a Householder transformation. [ ] R1 A = QR, where Q := H 1 H 2 H n and R := 0 A 1 = A [ ] Bk C A k = k 0 D k B k R k 1,k 1 is upper triangular and D k R n k+1,n k+1. zero out first column of D k under diagonal using H k.

Wilkinson Diagram x x x x x x x x x x x x H 1 r 11 r 12 r 13 0 x x 0 x x 0 x x H 2 r 11 r 12 r 13 0 r 22 r 23 0 0 x 0 0 x H 3 r 11 r 12 r 13 0 r 22 r 23 0 0 r 33 0 0 0. [ ] B2 C A 1 = D 1 A 2 = 2 0 D 2 [ ] B3 C A 3 = 3 0 D 3 The transformation is applied to the lower right block. A 4 = [ ] R1 0

m n H m 1 H 1 A = [R 1, S 1 ] = R, R 1 is upper triangular and S 1 R m,n m. If m = n then S 1 = [], the empty matrix.

Solving linear systems using QR-factorization Ax = b and A = Q 1 R 1 R 1 x = Q T 1 b So Householder triangulation can be used as an alternative to Gaussian elimination. Let B R m,r, where m = n. We give an algorithm using Householder transformations to solve r linear systems AX = B. The algorithm also finds the least squares solution of an overdetermined linear system. (More later).

Algorithm houselsq The algorithm uses housegen and an upper triangular version of Matlab s linsolve. 1. function X = houselsq(a,b) 2. [m,n]=size(a); [m,r]=size(b); A=[A B]; 3. for k=1:min(n,m-1) 4. [v,a(k,k)]=housegen(a(k:m,k)); A(k+1:m,k)=zeros(m-k,1); 5. A(k:m,k+1:n+r)=A(k:m,k+1:n+r) -v*(v *A(k:m,k+1:n+r)); 6. end 7. opts.ut=true; %Use upper triangular solver 8. X=linsolve(A(1:n,1:n),A(1:n,n+1:n+r),opts);

Discussion Advantages with Householder: always works for nonsingular systems pivoting not necessary numerically stable Advantages with Gauss Half the number of flops compared to Householder. Pivoting often not necessary. Usually stable, (but no guarantee). Lots of computer programs developed for systems with special structure.

Rotations In some applications, the matrix we want to triangulate has a special structure. Suppose for example that A R n,n is square and upper Hessenberg as illustrated by a Wilkinson diagram for n = 4 x x x x A = x x x x 0 x x x. 0 0 x x Only one entry in each column needs to be annihilated and a full Householder transformation will be inefficient. In this case we can use a simpler transformation.

Givens Rotations Definition A plane rotation (also called a Given s rotation is an orthogonal matrix of the form [ c s P := s c ], where c 2 + s 2 = 1. There is a unique angle θ [0, 2π) such that c = cos θ and s = sin θ. x θ y=px Figure: Clockwise rotation.

Zero in one component Suppose x = [ x1 x 2 ] 0, c := x 1 r, s := x 2 r, r := x 2. Then Px = 1 r [ ] [ ] x1 x 2 x1 = 1 x 2 x 1 x 2 r [ ] x 2 1 + x2 2 = 0 [ ] r, 0 and we have introduced a zero in x. We can take P = I when x = 0.

Rotation in the i, j-plane For 1 i < j n we define a rotation in the i, j-plane as a matrix P ij = (p kl ) R n,n by p kl = δ kl except for positions ii, jj, ij, ji, which are given by [ ] [ ] pii p ij c s =, where c 2 + s 2 = 1. s c p ji p jj P 23 = [ 1 0 0 0 ] 0 c s 0 0 s c 0 0 0 0 1

Triangulating an upper Hessenberg matrix A = [ x x x x ] [ r11 r 12 r 13 r 14 x x x x P 12 0 x x x 0 x x x 0 x x x 0 0 x x 0 0 x x ] [ r11 r 12 r 13 r 14 ] [ r11 r 12 r 13 r 14 P 23 0 r 22 r 23 r 24 P 34 0 r 22 r 23 r 24 0 0 x x 0 0 x x [ 1 0 0 0 ] 0 c s 0 P 23 = 0 s c 0 0 0 0 1 Premultiplication by P 23 only affects rows 2, 3. 0 0 r 33 r 34 0 0 0 r 44 ].

Householder and Givens Householder reduction One column at a time Suitable for full matrices O(4n 3 /3) flops Givens reduction One entry at a time Suitable for banded and sparse matrices O(8n 3 /3) flops for a full matrix (2 times Householder # flops)

Least Squares Given A C m,n and b C m with m n. Find x C n such that Ax = b. Overdetermined if m > n. Might have no solution. Find x that minimizes E(x) := Ax b 2. To find x which minimizes E(x) or E(x) is called the least squares problem.

Existence and uniqueness Theorem The least squares problem always has a solution. The solution is unique if and only if A has linearly independent columns. Moreover, the following are equivalent. 1. x is a solution of the least squares problem (LSQ). 2. A H Ax = A H b 3. x = A b + z, where A is the pseudo-inverse of A and z ker(a). We have x 2 A b 2 for all solutions x.

Proof Let b = b 1 + b 2, where b 1 span(a) and b 2 ker(a H ) are the orthogonal projections into span(a) and ker(a H ), respectively Since b H 2 v = 0 for any v span(a) we have b H 2 (b 1 Ax) = 0 for any x C m. b Ax 2 2 = (b 1 Ax)+b 2 2 2 = b 1 Ax 2 2+ b 2 2 2 b 2 2 2, equality if and only if Ax = b 1. Since b 1 span(a) we can always find such an x and existence follows. Uniqueness: b 1 is unique and the solution of Ax = b 1 is unique if A has linearly independent columns.

Proof2 1 2 : x solves LSQ Ax = b 1 b Ax = b 2 ker(a H ) A H (b Ax ) = 0 2. 1 3 : x = A b + z solves LSQ Ax = b 1 b 1 = AA b + Az b 1 = b 1 + Az z ker(a)

Proof Minimum norm A = UΣV H = U 1 Σ 1 V H 1 SVD and SVF of A. V = [V 1, V 2 ] where V 2 is a basis for ker(a). Suppose x = A b + z, with z ker(a) is a solution. z = V 2 y so z H (A b) = (y H V H 2 )(V 1 Σ 1 1 U H 1 )b = y H (V H 2 V 1 )Σ 1 1 U H 1 b = 0. Pytagoras: x 2 2 = A b + z 2 2 = A b 2 2 + z 2 2 A b 2 2.

Solution of LSQ using QR-factorization Assume that A, b are real and that A has linearly independent columns. A = Q 1 R 1, the QR-factorization of A. A T A = R T 1 Q T 1 Q 1 R 1 = R T 1 R 1, A T b = R T 1 Q T 1 b. R 1 is nonsingular. A T Ax = A T b = R 1 x = Q T 1 b. Same equations as for linear system. Can use houselsq.

Example 1 3 1 1 A = 1 3 7 1 1 4 and b = 1 1. 1 1 2 1 The least squares solution x is found by solving the system 1 2 2 3 x 1 0 4 5 x 2 = 1 1 1 1 1 1 1 1 1 1 2 2 1 = 0, 0 0 6 x 3 1 1 1 1 0 1 and we find x = [1, 0, 0] T.

QR without full rank Theorem Suppose m n 1 and A R m,n. Then A has a QR-decomposition and a QR-factorization. Proof. houselsq can always be used to find a QR-decomposition.

Compare with normal equations A = QR and R 1 x = Q T 1 b compared to A T Ax = A T b Flops: Householder 2mn 2 2n 3 /3. Normal eqn mn 2. Householder twice as expensive for large m. Householder is numerically stable. Normal equations often ill-conditioned. Condition number problem: The 2-norm condition number of B = A T A is (σ 1 /σ n ) 2, the square of the condition number of A.

Solving LSQ using SVF Can be used for rank deficient problems. Must find the pseudoinverse of the matrix. x = A b + z is a solution for any z ker(a). Example: 1 1 1/ 2 A = 1 1 = U 1 Σ 1 V T 1 = 1/ 2 [2][1/ 2 1/ 2]. 0 0 0 A = V 1 Σ 1 1 U T 1 = [ ] 1/ 2 1/ [1/2][1/ 2 1/ 2 0] = 1 2 4 [ 1, 1] T is a basis for ker(a) x = 1 [ ] 1 1 0 1 b b 4 1 1 0 2 + z b 3 This gives all solutions of min Ax b 2. [ ] 1 1 [ 1 1 ] 0 1 1 0