Orthonormal Transformations and Least Squares

Similar documents
Orthonormal Transformations

Orthogonal Transformations

Least Squares. Tom Lyche. October 26, Centre of Mathematics for Applications, Department of Informatics, University of Oslo

The Singular Value Decomposition and Least Squares Problems

Computing Eigenvalues and/or Eigenvectors;Part 1, Generalities and symmetric matrices

AMS526: Numerical Analysis I (Numerical Linear Algebra)

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

be a Householder matrix. Then prove the followings H = I 2 uut Hu = (I 2 uu u T u )u = u 2 uut u

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

This can be accomplished by left matrix multiplication as follows: I

Householder reflectors are matrices of the form. P = I 2ww T, where w is a unit vector (a vector of 2-norm unity)

Computing Eigenvalues and/or Eigenvectors;Part 1, Generalities and symmetric matrices

5.6. PSEUDOINVERSES 101. A H w.

Linear Analysis Lecture 16

Lecture 2 INF-MAT : , LU, symmetric LU, Positve (semi)definite, Cholesky, Semi-Cholesky

Notes on Eigenvalues, Singular Values and QR

Section 6.4. The Gram Schmidt Process

Matrix Factorization and Analysis

Numerical Linear Algebra Chap. 2: Least Squares Problems

The QR Factorization

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Orthogonalization and least squares methods

5 Selected Topics in Numerical Linear Algebra

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 9

Applied Numerical Linear Algebra. Lecture 8

Lecture 3: QR-Factorization

Inverses. Stephen Boyd. EE103 Stanford University. October 28, 2017

Since the determinant of a diagonal matrix is the product of its diagonal elements it is trivial to see that det(a) = α 2. = max. A 1 x.

Numerical Methods I Non-Square and Sparse Linear Systems

Computational Methods. Least Squares Approximation/Optimization

Applied Linear Algebra in Geoscience Using MATLAB

Vector and Matrix Norms. Vector and Matrix Norms

Matrix decompositions

Linear Algebra, part 3 QR and SVD

We will discuss matrix diagonalization algorithms in Numerical Recipes in the context of the eigenvalue problem in quantum mechanics, m A n = λ m

Lecture # 5 The Linear Least Squares Problem. r LS = b Xy LS. Our approach, compute the Q R decomposition, that is, n R X = Q, m n 0

3 QR factorization revisited

Least-Squares Systems and The QR factorization

UNIT 6: The singular value decomposition.

The matrix will only be consistent if the last entry of row three is 0, meaning 2b 3 + b 2 b 1 = 0.

Linear Algebra, part 3. Going back to least squares. Mathematical Models, Analysis and Simulation = 0. a T 1 e. a T n e. Anna-Karin Tornberg

4.8 Arnoldi Iteration, Krylov Subspaces and GMRES

MATH 532: Linear Algebra

Linear Algebra Review. Vectors

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Notes on Solving Linear Least-Squares Problems

Basic Elements of Linear Algebra

Linear Algebra in Actuarial Science: Slides to the lecture

Computing Eigenvalues and/or Eigenvectors;Part 2, The Power method and QR-algorithm

Lecture 6, Sci. Comp. for DPhil Students

Mathematical Optimisation, Chpt 2: Linear Equations and inequalities

MAT Linear Algebra Collection of sample exams

Numerical Methods I Solving Square Linear Systems: GEM and LU factorization

Class notes: Approximation

Linear Algebra Massoud Malek

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

MATH 3511 Lecture 1. Solving Linear Systems 1

Numerical Linear Algebra Primer. Ryan Tibshirani Convex Optimization /36-725

Problem set 5: SVD, Orthogonal projections, etc.

Linear Algebra Primer

Scientific Computing: Solving Linear Systems

Math 577 Assignment 7

Linear algebra for computational statistics

Problem 1. CS205 Homework #2 Solutions. Solution

Basic Concepts in Linear Algebra

Numerical Linear Algebra

Practical Linear Algebra: A Geometry Toolbox

Program Lecture 2. Numerical Linear Algebra. Gaussian elimination (2) Gaussian elimination. Decompositions, numerical aspects

Lecture Notes for Inf-Mat 3350/4350, Tom Lyche

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Solution of Linear Equations

Computational math: Assignment 1

14.2 QR Factorization with Column Pivoting

Decomposition. bq (m n) R b (n n) r 11 r 1n

Main matrix factorizations

Linear Least squares

Review of Basic Concepts in Linear Algebra

(a) If A is a 3 by 4 matrix, what does this tell us about its nullspace? Solution: dim N(A) 1, since rank(a) 3. Ax =

AM205: Assignment 2. i=1

Cheat Sheet for MATH461

COMP 558 lecture 18 Nov. 15, 2010

Applied Linear Algebra

Math 407: Linear Optimization

LU Factorization. LU factorization is the most common way of solving linear systems! Ax = b LUx = b

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Solving linear equations with Gaussian Elimination (I)

Linear Least-Squares Data Fitting

forms Christopher Engström November 14, 2014 MAA704: Matrix factorization and canonical forms Matrix properties Matrix factorization Canonical forms

7. Dimension and Structure.

6.4 Krylov Subspaces and Conjugate Gradients

The QR Decomposition

G1110 & 852G1 Numerical Linear Algebra

Conceptual Questions for Review

9. Numerical linear algebra background

The Singular Value Decomposition

Lecture 4 Orthonormal vectors and QR factorization

7. Symmetric Matrices and Quadratic Forms

The Singular Value Decomposition

Transcription:

Orthonormal Transformations and Least Squares Tom Lyche Centre of Mathematics for Applications, Department of Informatics, University of Oslo October 30, 2009

Applications of Qx with Q T Q = I 1. solving linear equations (today) 2. solving least squares problems (today) 3. solving eigenvalue problems (next time) 4. finding the singular value decomposition

Stability of orthogonal transformations We recall: Suppose Q is an orthonormal matrix, that is Q T Q = I. If x = x + e is an approximation to x then Qx = Qx + Qe and Qe 2 = e 2. If A R m,n and A = A + E is an approximation to A then QA = QA + QE and QE 2 = E 2. Conculsion: when an orthonormal transformation is applied to a vector or a matrix the error will not grow.

QR-Decomposition and Factorization Definition Let A C m,n with m n 1. We say that A = QR is a QR Decomposition of A if Q C m,m is square and unitary and [ ] R1 R = 0 m n,n where R 1 C n,n is upper triangular and 0 m n,n C m n,n is the zero matrix. We call A = QR a QR Factorization of A if Q C m,n has orthonormal columns and R C n,n is upper triangular.

QR decomp QR fact It is easy to construct a QR-factorization from a QR-decomposition A = QR. [ ] R1 We simply partition Q = [Q 1, Q 2 ] and R = where 0 Q 1 R m,n and R 1 R n,n. Then A = Q 1 R 1 is a QR-factorization of A. Conversely we construct the QR-decomposition from the QR-factorization A = Q 1 R 1 by extending the columns of Q 1 to an orthonormal basis Q = [Q 1, Q 2 ] for R m and defining R = [ R 1 0 ] R m,n. If m = n then a factorization is also a decomposition and vice versa.

An example of a QR-decomposition is 1 3 1 1 1 1 1 2 2 3 A = 1 3 7 1 1 4 = 1 1 1 1 1 2 1 1 1 1 0 4 5 0 0 6 = QR, 1 1 2 1 1 1 1 0 0 0 while a QR-factorization A = Q 1 R 1 is obtained by dropping the last column of Q and the last row of R so that 1 1 1 A = 1 1 1 1 2 2 3 2 1 1 1 0 4 5 = Q 1 R 1 0 0 6 1 1 1

Existence of QR Factorization Theorem Suppose A C m,n with m n 1 has linearly independent columns. Then A has a QR decomposition and a QR factorization. The QR factorization is unique if R has positive diagonal elements.

Proof Existence A T A symmetric positive definite. Cholesky factorization A T A = R T 1 R 1, R 1 R n,n is upper triangular and nonsingular. Q 1 := AR 1 1 Q T 1 Q 1 = I. QR-factorization A = Q 1 R 1 Uniqueness A = Q 1 R 1 A T A = R T 1 R 1 Cholesky factorization unique implies R 1 unique and Q 1 = AR 1 1 unique.

Hadamard s inequality Theorem (Hadamard s Inequality) For any A = [a 1,..., a n ] C n,n we have det(a) n a j 2. (1) j=1 Equality holds if and only if A has a zero column or the columns of A are orthogonal. Example A = [ ] 2 1, det(a) = 3 a 1 2 1 2 a 2 2 = 5.

Proof Let A = QR be a QR factorization of A 1 = det(i) = det(q H Q) = det(q H ) det(q) = det(q) 2 det(q) = 1, R = [r 1,..., r n ], (A H A) jj = a j 2 2 = (R H R) jj = r j 2 2, det(a) = det(qr) = det(r) = n j=1 r jj n j=1 r j 2 = n j=1 a j 2, If equality holds then either det(a) = 0 and A has a zero column, or det(a) 0 and r jj = r j 2 for j = 1,..., n, Then R is diagonal and A H A = R H R is diagonal The columns of A are orthogonal.

QR and Gram-Schmidt If A R m,n has rank n, then the set of columns {a 1,..., a n } forms a basis for span(a) and the Gram-Schmidt orthogonalization process takes the form v 1 = a 1, v j = a j j 1 i=1 a T j v i v T i v i v i, for j = 2,..., n. {v 1,..., v n } is an orthogonal basis for span(a). j 1 a 1 = v 1, a j = ρ ij v i + v j, where ρ ij = at j v i v T i v i i=1

GS=QR-factorization j 1 a 1 = v 1, a j = ρ ij v i + v j, A = VˆR, where V := [v 1,..., v n ] R m,n and ˆR is unit upper triangular. i=1 Since {v 1,..., v n } is a basis for span(a) the matrix D := diag( v 1 2,..., v n 2 ) is nonsingular, the matrix Q 1 := VD 1 v = [ v 1 2,..., n v n 2 ] is orthogonal. Therefore, A = Q 1 R 1, with R 1 := DˆR is a QR-factorization of A with positive diagonal entries in R 1. v 1

Finding QR Gram Schmidt is numerically unstable Use instead orthogonal transformations they are stable study Householder transformations and Givens rotations

Householder Transformations A matrix H R n,n of the form H := I uu T, where u R n and u T u = 2 is called a Householder transformation. For n = 2 we find H = [ 1 u 2 1 u 1 u 2 u 2 u 1 1 u 2 2 A Householder transformation is symmetric and orthonormal. In particular, H T H = H 2 = (I uu T )(I uu T ) = I 2uu T +u(u T u)u T = I. ].

Alternative representations 1. Householder used I 2uu T, where u T u = 1. 2. For any nonzero v R n the matrix H := I 2 vvt v T v is a Householder transformation. In fact H = I uu T, where u := 2 v v 2.

Transformation Lemma Suppose x, y R n with x 2 = y 2 and v := x y 0. Then ( I 2 vvt v T v) x = y. Proof. Since x T x = y T y we have v T v = (x y) T (x y) = 2x T x 2y T x = 2v T x. (2) But then ( ) I 2 vvt v T v x = x 2v T x v = x v = y. v T v

Geometric Interpretation Hx = y Hx is the mirror image (reflection) of x H = I 2vv T v T v = P vv T T vv, where P := I v T v v T v Px is the orthogonal projection of x into span{x + y} Px = x v T x v T v v (2) = x 1 2 v = 1 (x + y) 2 x+y y=hx Px x

Zero out entries x R n \ 0 and y = αe 1. If α 2 = x T x then x 2 = y 2. Two solutions α = + x 2 and α = x 2 Choose α to have opposite sign of x 1. obtain y x so H exists even if x = e 1. Numerical stability since we avoid cancelation in the subtraction v 1 = x 1 α.

Formulas for an algorithm Lemma For a nonzero vector x R n we define { x 2 if x 1 > 0 α := + x 2 otherwise, (3) and H := I uu T with u = x/α e 1 1 x1 /α. (4) Then H is a Householder transformation and Hx = αe 1.

Proof Let y := αe 1 and v = x y. By construction x T x = y T y and v 0. By Lemma 4 we have Hx = αe 1, where H = I 2 vvt v T v is a Householder transformation. Since 0 < v T v = (x αe 1 ) T (x αe 1 ) = x T x 2αx 1 +α 2 = 2α(α x 1 ), H = I 2(x αe 1)(x αe 1 ) T 2α(α x 1 ) = I (x/α e 1)(x/α e 1 ) T 1 x 1 /α = I uu T.

example For x := [1, 2, 2] T we have x 2 = 3 and since x 1 > 0 we choose α = 3. We find u = [2, 1, 1] T / 3 and H = I 1 2 1 [ 2 1 1 ] = 1 1 2 2 2 2 1. 3 3 1 2 1 2 We see that Hx = 3e 1.

What about x = 0? To simplify later algorithms we define a Householder transformation also for x = 0. x = 0 Ux = 0 for any U. Use u := 2e 1. I uu T is a Householder transformation.

Algorithm housegen To given x R n the following algorithm computes a = α and the vector u so that (I uu T )x = αe 1. 1. function [u,a]=housegen(x) 2. a=norm(x); u=x; 3. if a==0 4. u(1)=sqrt(2); return; 5. end 6. if u(1)>0 7. a=-a; 8. end 9. u=u/a; u(1)=u(1)-1; 10. u=u/sqrt(-u(1));

Zero out only part of a vector Suppose y R k, z R n k and α 2 = z T z. Consider finding a Householder transformation H such that H [ y z ] = [ y αe 1 ]. Let [û, α] = housegen(z) and set u T = [0 T, û T ]. Then [ ] [ ] [ ] I 0 0û [0 ] I 0 H = I uu T = û = 0 I 0 Ĥ, where Ĥ = I ûût. Since u T u = û T û = 2 we see that H and Ĥ are Householder transformations.

Householder triangulation A R m,n with m > n. H n H n 1 H 1 A = [ R1 0 ], R 1 is upper triangular and each H k is a Householder transformation. [ ] R1 A = QR, where Q := H 1 H 2 H n and R := 0 A 1 = A [ ] Bk C A k = k 0 D k B k R k 1,k 1 is upper triangular and D k R n k+1,n k+1. zero out first column of D k under diagonal using H k.

Wilkinson Diagram x x x x x x x x x x x x H 1 r 11 r 12 r 13 0 x x 0 x x 0 x x H 2 r 11 r 12 r 13 0 r 22 r 23 0 0 x 0 0 x H 3 r 11 r 12 r 13 0 r 22 r 23 0 0 r 33 0 0 0. [ ] B2 C A 1 = D 1 A 2 = 2 0 D 2 [ ] B3 C A 3 = 3 0 D 3 The transformation is applied to the lower right block. A 4 = [ ] R1 0

m n H m 1 H 1 A = [R 1, S 1 ] = R, R 1 is upper triangular and S 1 R m,n m. If m = n then S 1 = [], the empty matrix.

Solving linear systems using QR-factorization Ax = b and A = Q 1 R 1 R 1 x = Q T 1 b So Householder triangulation can be used as an alternative to Gaussian elimination. Let B R m,r, where m = n. We give an algorithm using Householder transformations to solve r linear systems AX = B. The algorithm also finds the least squares solution of an overdetermined linear system. (More later).

Algorithm houselsq The algorithm uses housegen and an upper triangular version of Matlab s linsolve. 1. function X = houselsq(a,b) 2. [m,n]=size(a); [m,r]=size(b); A=[A B]; 3. for k=1:min(n,m-1) 4. [v,a(k,k)]=housegen(a(k:m,k)); A(k+1:m,k)=zeros(m-k,1); 5. A(k:m,k+1:n+r)=A(k:m,k+1:n+r) -v*(v *A(k:m,k+1:n+r)); 6. end 7. opts.ut=true; %Use upper triangular solver 8. X=linsolve(A(1:n,1:n),A(1:n,n+1:n+r),opts);

Discussion Advantages with Householder: always works for nonsingular systems pivoting not necessary numerically stable Advantages with Gauss Half the number of flops compared to Householder. Pivoting often not necessary. Usually stable, (but no guarantee). Lots of computer programs developed for systems with special structure.

Rotations In some applications, the matrix we want to triangulate has a special structure. Suppose for example that A R n,n is square and upper Hessenberg as illustrated by a Wilkinson diagram for n = 4 x x x x A = x x x x 0 x x x. 0 0 x x Only one entry in each column needs to be annihilated and a full Householder transformation will be inefficient. In this case we can use a simpler transformation.

Givens Rotations Definition A plane rotation (also called a Given s rotation is an orthogonal matrix of the form [ c s P := s c ], where c 2 + s 2 = 1. There is a unique angle θ [0, 2π) such that c = cos θ and s = sin θ. x θ y=px Figure: Clockwise rotation.

Zero in one component Suppose x = [ x1 x 2 ] 0, c := x 1 r, s := x 2 r, r := x 2. Then Px = 1 r [ ] [ ] x1 x 2 x1 = 1 x 2 x 1 x 2 r [ ] x 2 1 + x2 2 = 0 [ ] r, 0 and we have introduced a zero in x. We can take P = I when x = 0.

Rotation in the i, j-plane For 1 i < j n we define a rotation in the i, j-plane as a matrix P ij = (p kl ) R n,n by p kl = δ kl except for positions ii, jj, ij, ji, which are given by [ ] [ ] pii p ij c s =, where c 2 + s 2 = 1. s c p ji p jj P 23 = [ 1 0 0 0 ] 0 c s 0 0 s c 0 0 0 0 1

Triangulating an upper Hessenberg matrix A = [ x x x x ] [ r11 r 12 r 13 r 14 x x x x P 12 0 x x x 0 x x x 0 x x x 0 0 x x 0 0 x x ] [ r11 r 12 r 13 r 14 ] [ r11 r 12 r 13 r 14 P 23 0 r 22 r 23 r 24 P 34 0 r 22 r 23 r 24 0 0 x x 0 0 x x [ 1 0 0 0 ] 0 c s 0 P 23 = 0 s c 0 0 0 0 1 Premultiplication by P 23 only affects rows 2, 3. 0 0 r 33 r 34 0 0 0 r 44 ].

Householder and Givens Householder reduction One column at a time Suitable for full matrices O(4n 3 /3) flops Givens reduction One entry at a time Suitable for banded and sparse matrices O(8n 3 /3) flops for a full matrix (2 times Householder # flops)

Least Squares Given A C m,n and b C m with m n. Find x C n such that Ax = b. Overdetermined if m > n. Might have no solution. Find x that minimizes E(x) := Ax b 2. To find x which minimizes E(x) or E(x) is called the least squares problem.

Existence and uniqueness Theorem The least squares problem always has a solution. The solution is unique if and only if A has linearly independent columns. Moreover, the following are equivalent. 1. x is a solution of the least squares problem (LSQ). 2. A H Ax = A H b 3. x = A b + z, where A is the pseudo-inverse of A and z ker(a). We have x 2 A b 2 for all solutions x.

Proof Let b = b 1 + b 2, where b 1 span(a) and b 2 ker(a H ) are the orthogonal projections into span(a) and ker(a H ), respectively Since b H 2 v = 0 for any v span(a) we have b H 2 (b 1 Ax) = 0 for any x C m. b Ax 2 2 = (b 1 Ax)+b 2 2 2 = b 1 Ax 2 2+ b 2 2 2 b 2 2 2, equality if and only if Ax = b 1. Since b 1 span(a) we can always find such an x and existence follows. Uniqueness: b 1 is unique and the solution of Ax = b 1 is unique if A has linearly independent columns.

Proof2 1 2 : x solves LSQ Ax = b 1 b Ax = b 2 ker(a H ) A H (b Ax ) = 0 2. 1 3 : x = A b + z solves LSQ Ax = b 1 b 1 = AA b + Az b 1 = b 1 + Az z ker(a)

Proof Minimum norm A = UΣV H = U 1 Σ 1 V H 1 SVD and SVF of A. V = [V 1, V 2 ] where V 2 is a basis for ker(a). Suppose x = A b + z, with z ker(a) is a solution. z = V 2 y so z H (A b) = (y H V H 2 )(V 1 Σ 1 1 U H 1 )b = y H (V H 2 V 1 )Σ 1 1 U H 1 b = 0. Pytagoras: x 2 2 = A b + z 2 2 = A b 2 2 + z 2 2 A b 2 2.

Solution of LSQ using QR-factorization Assume that A, b are real and that A has linearly independent columns. A = Q 1 R 1, the QR-factorization of A. A T A = R T 1 Q T 1 Q 1 R 1 = R T 1 R 1, A T b = R T 1 Q T 1 b. R 1 is nonsingular. A T Ax = A T b = R 1 x = Q T 1 b. Same equations as for linear system. Can use houselsq.

An example of a QR-decomposition is 1 3 1 1 1 1 1 2 2 3 A = 1 3 7 1 1 4 = 1 1 1 1 1 2 1 1 1 1 0 4 5 0 0 6 = QR, 1 1 2 1 1 1 1 0 0 0 while a QR-factorization A = Q 1 R 1 is obtained by dropping the last column of Q and the last row of R so that 1 1 1 A = 1 1 1 1 2 2 3 2 1 1 1 0 4 5 = Q 1 R 1 0 0 6 1 1 1

Example 1 3 1 1 A = 1 3 7 1 1 4 and b = 1 1. 1 1 2 1 The least squares solution x is found by solving the system 1 2 2 3 x 1 0 4 5 x 2 = 1 1 1 1 1 1 1 1 1 1 2 2 1 = 0, 0 0 6 x 3 1 1 1 1 0 1 and we find x = [1, 0, 0] T.

Algorithm houselsq The algorithm uses housegen and an upper triangular version of Matlab s linsolve. 1. function X = houselsq(a,b) 2. [m,n]=size(a); [m,r]=size(b); A=[A B]; 3. for k=1:min(n,m-1) 4. [v,a(k,k)]=housegen(a(k:m,k)); A(k+1:m,k)=zeros(m-k,1); 5. A(k:m,k+1:n+r)=A(k:m,k+1:n+r) -v*(v *A(k:m,k+1:n+r)); 6. end 7. opts.ut=true; %Use upper triangular solver 8. X=linsolve(A(1:n,1:n),A(1:n,n+1:n+r),opts);

QR without full rank Theorem Suppose m n 1 and A R m,n. Then A has a QR-decomposition and a QR-factorization. Proof. houselsq can always be used to find a QR-decomposition.

Compare with normal equations A = QR and R 1 x = Q T 1 b compared to A T Ax = A T b Flops: Householder 2mn 2 2n 3 /3. Normal eqn mn 2. Householder twice as expensive for large m. Householder is numerically stable. Normal equations often ill-conditioned. Condition number problem: The 2-norm condition number of B = A T A is (σ 1 /σ n ) 2, the square of the condition number of A.

Solving LSQ using SVF Can be used for rank deficient problems. Must find the pseudoinverse of the matrix. x = A b + z is a solution for any z ker(a). Example: 1 1 1/ 2 A = 1 1 = U 1 Σ 1 V T 1 = 1/ 2 [2][1/ 2 1/ 2]. 0 0 0 A = V 1 Σ 1 1 U T 1 = [ ] 1/ 2 1/ [1/2][1/ 2 1/ 2 0] = 1 2 4 [ 1, 1] T is a basis for ker(a) x = 1 [ ] 1 1 0 1 b b 4 1 1 0 2 + z b 3 This gives all solutions of min Ax b 2. [ ] 1 1 [ 1 1 ] 0 1 1 0