A Vector Space Justification of Householder Orthogonalization

Similar documents
Linear Models Review

Conceptual Questions for Review

A Note on UMPI F Tests

Lecture 10: Vector Algebra: Orthogonal Basis

Math Linear Algebra

P = A(A T A) 1 A T. A Om (m n)

LINEAR ALGEBRA KNOWLEDGE SURVEY

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition

Math 261 Lecture Notes: Sections 6.1, 6.2, 6.3 and 6.4 Orthogonal Sets and Projections

Problem 1. CS205 Homework #2 Solutions. Solution

Section 6.4. The Gram Schmidt Process

Lecture 4 Orthonormal vectors and QR factorization

Assignment #9: Orthogonal Projections, Gram-Schmidt, and Least Squares. Name:

MAT2342 : Introduction to Applied Linear Algebra Mike Newman, fall Projections. introduction

Chapter 6: Orthogonality

The 'linear algebra way' of talking about "angle" and "similarity" between two vectors is called "inner product". We'll define this next.

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

LINEAR ALGEBRA 1, 2012-I PARTIAL EXAM 3 SOLUTIONS TO PRACTICE PROBLEMS

7. Dimension and Structure.

MATH 304 Linear Algebra Lecture 20: The Gram-Schmidt process (continued). Eigenvalues and eigenvectors.

YORK UNIVERSITY. Faculty of Science Department of Mathematics and Statistics MATH M Test #2 Solutions

Chapter 3 Transformations

Linear Algebra. Session 12

and u and v are orthogonal if and only if u v = 0. u v = x1x2 + y1y2 + z1z2. 1. In R 3 the dot product is defined by

Lecture 1: Basic Concepts

Maximum Likelihood Estimation

MATH 31 - ADDITIONAL PRACTICE PROBLEMS FOR FINAL

SOLUTION KEY TO THE LINEAR ALGEBRA FINAL EXAM 1 2 ( 2) ( 1) c a = 1 0

Answer Key for Exam #2

MATH 22A: LINEAR ALGEBRA Chapter 4

Lecture 11. Linear systems: Cholesky method. Eigensystems: Terminology. Jacobi transformations QR transformation

Lecture notes: Applied linear algebra Part 1. Version 2

AMS526: Numerical Analysis I (Numerical Linear Algebra)

18.06 Professor Johnson Quiz 1 October 3, 2007

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

Math 520 Exam 2 Topic Outline Sections 1 3 (Xiao/Dumas/Liaw) Spring 2008

Numerical Linear Algebra Chap. 2: Least Squares Problems

Math 291-2: Lecture Notes Northwestern University, Winter 2016

Worksheet for Lecture 25 Section 6.4 Gram-Schmidt Process

Linear Analysis Lecture 16

MATH 1120 (LINEAR ALGEBRA 1), FINAL EXAM FALL 2011 SOLUTIONS TO PRACTICE VERSION

Notes on Householder QR Factorization

A Primer in Econometric Theory

I. Multiple Choice Questions (Answer any eight)

Section 6.2, 6.3 Orthogonal Sets, Orthogonal Projections

Lecture # 5 The Linear Least Squares Problem. r LS = b Xy LS. Our approach, compute the Q R decomposition, that is, n R X = Q, m n 0

Math 3191 Applied Linear Algebra

This model of the conditional expectation is linear in the parameters. A more practical and relaxed attitude towards linear regression is to say that

MTH 2310, FALL Introduction

Lecture 4: Applications of Orthogonality: QR Decompositions

There are two things that are particularly nice about the first basis

Orthogonality. 6.1 Orthogonal Vectors and Subspaces. Chapter 6

MATH Linear Algebra

Vectors. Vectors and the scalar multiplication and vector addition operations:

Dimension and Structure

STAT 151A: Lab 1. 1 Logistics. 2 Reference. 3 Playing with R: graphics and lm() 4 Random vectors. Billy Fang. 2 September 2017

(Refer Slide Time: 2:04)

Inner products. Theorem (basic properties): Given vectors u, v, w in an inner product space V, and a scalar k, the following properties hold:

Math 407: Linear Optimization

1. Diagonalize the matrix A if possible, that is, find an invertible matrix P and a diagonal

AMS526: Numerical Analysis I (Numerical Linear Algebra)

Applied Linear Algebra in Geoscience Using MATLAB

7. Symmetric Matrices and Quadratic Forms

Family Feud Review. Linear Algebra. October 22, 2013

6. Orthogonality and Least-Squares

LINEAR ALGEBRA SUMMARY SHEET.

EECS 275 Matrix Computation

Least-Squares Systems and The QR factorization

2. Every linear system with the same number of equations as unknowns has a unique solution.

Typical Problem: Compute.

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

MA 265 FINAL EXAM Fall 2012

18.06SC Final Exam Solutions

Lecture 6: Lies, Inner Product Spaces, and Symmetric Matrices

Dot Products, Transposes, and Orthogonal Projections

MAT Linear Algebra Collection of sample exams

Matrix Factorizations

Chapter 2. Vectors and Vector Spaces

MTH 2032 SemesterII

Math 413/513 Chapter 6 (from Friedberg, Insel, & Spence)

3. For a given dataset and linear model, what do you think is true about least squares estimates? Is Ŷ always unique? Yes. Is ˆβ always unique? No.

MATH 235: Inner Product Spaces, Assignment 7

ENGG5781 Matrix Analysis and Computations Lecture 8: QR Decomposition

Math 3191 Applied Linear Algebra

Math 18, Linear Algebra, Lecture C00, Spring 2017 Review and Practice Problems for Final Exam

RECURSIVE SUBSPACE IDENTIFICATION IN THE LEAST SQUARES FRAMEWORK

MIDTERM I LINEAR ALGEBRA. Friday February 16, Name PRACTICE EXAM SOLUTIONS

AM205: Assignment 2. i=1

Math 416, Spring 2010 Gram-Schmidt, the QR-factorization, Orthogonal Matrices March 4, 2010 GRAM-SCHMIDT, THE QR-FACTORIZATION, ORTHOGONAL MATRICES

The Hilbert Space of Random Variables

3 QR factorization revisited

Assignment 1 Math 5341 Linear Algebra Review. Give complete answers to each of the following questions. Show all of your work.

The QR Factorization

Then x 1,..., x n is a basis as desired. Indeed, it suffices to verify that it spans V, since n = dim(v ). We may write any v V as r

Orthogonal Complements

Lecture 6, Sci. Comp. for DPhil Students

Orthogonal Projection and Least Squares Prof. Philip Pennance 1 -Version: December 12, 2016

Orthogonal complement

Linear Algebra. Paul Yiu. Department of Mathematics Florida Atlantic University. Fall A: Inner products

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

Transcription:

A Vector Space Justification of Householder Orthogonalization Ronald Christensen Professor of Statistics Department of Mathematics and Statistics University of New Mexico August 28, 2015 Abstract We demonstrate explicitly that the Householder transformation method of performing a QR decomposition is equivalent to performing the Gram-Schmidt orthonomalization. KEY WORDS: Gram-Schmidt, Householder transformations, QR decomposition. 1

1. Introduction For a standard statistical linear model Y = Xβ + e, E(e) = 0, Cov(e) = σ 2 I, the fundamental statistic is the least squares vector of predicted values, Ŷ X ˆβ. This vector is the perpendicular projection of Y into the vector space spanned by the columns of X, often called the column space of X and written C(X). Finding the perpendicular projection essentially requires finding an orthonormal basis for C(X). The most common way of doing that seems to be finding a QR decomposition of X. The purpose of this work is to give a vector (Hilbert) space justification for one of the most common, mathematically stable methods for finding a QR decomposition: the method based on Householder matrices. Consider an n p matrix X with r(x) = r. A QR decomposition of X is a characterization X = QR where Q n r has orthonormal columns and R r p is upper triangular. The columns of X and Q are the vectors x j and o j, respectively, so {o 1,..., o r } in an orthornormal basis for C(X). Numerically stable approaches to finding a QR decomposition typically find a sequence of orthogonal matrices, Q 1,..., Q t that upper triangularize X, i.e., Q t Q 1 X = R where R is n p with and R upper triangular. Note that R = [ Rr p 0 n r p ] Q [Q t Q 1 ]. is an orthogonal matrix because it is the product of orthogonal matrices. The QR decomposition then involves the aforementioned R with Q consisting of the first r columns of Q. The columns of Q are orthonormal and provide a spanning set for C(X), hence give an orthonormal basis. Although this is a nicely terse argument, it is an extremely indirect approach to finding an orthogonal basis. Typically this approach to QR is performed using Householder or Givens transformations. As discussed in the next section, another method for producing the QR decomposition, and a far more transparent method for producing an orthonormal basis, uses the Gram- Schmidt (G-S) algorithm. Harville (1997) points out that the QR decomposition is unique (for full rank X), from which it follows that any alternative forms of finding QR must ultimately reduce to the G-S algorithm. We demonstrate this equivalence with G-S for the Householder method. (Givens seems less amenable to this purpose.) In fact, our focus is not on finding the QR decomposition but on finding the matrix Q. 1

2. Gram-Schmidt The standard theoretical method for orthonormalizing vectors is to use the Gram-Schmidt (G-S) algorithm. The Gram Schmidt Theorem. There exists an orthonormal basis for C(X), say {o 1,..., o r }, in which, for s = 1,..., r, {o 1,..., o k } is an orthonormal basis for C(x 1,..., x s ) when the space has rank k. Proof. Define the o i s using the G-S Algorithm. Let v v v and let o 1 = x 1 / x 1, s 1 z s = x s o k (o kx s ), o s = z s / z s. The proof by induction is standard (when r(x) = p) and omitted. However, if {x 1,..., x p } is not a basis, z s = 0 implies that x s is a linear function of {x 1,..., x s 1 }. We exclude this z s from the orthonormal basis being constructed and adjust all subscripts accordingly. Clearly, when constructing a QR decomposition (as opposed to an orthonormal basis), we will be able to write the corresponding x s as a linear combination of o 1,..., o k. The sweep operators used by some statistical regression programs are fairly straight forward applications of this G-S algorithm, also see LaMotte (2014). Akin to Christensen (2011, Appendix B), G-S essentially defines where z s+1 = (I M k )x s+1 M k = k o j o j j=1 is the perpendicular projection operator onto C(x 1,..., x s ) which has rank k, i.e., z s+1 is the perpendicular projection of x s+1 onto the orthogonal complement of C(x 1,..., x s ). A numerically more stable form of G-S involves taking the product of individual projections operators, [ k ] z s+1 = (I o j o j) x s+1 = [(I o k o k) [(I o 2 o 2)[(I o 1 o 1)x s+1 ]] ]. j=1 Numerical stability is all about the order in which you do things. It is not hard to show that k j=1 (I o jo j) = (I M k ). This method is not thought to be as stable as triangulation through orthogonal matrices. 2

3. Householder Transformations Householder transformations reflect a vector in a (hyper)plane. For a unit vector y with y = 1 define the Householder transformation as H = I 2yy. The reflection in a plane is relative to the plane that is the orthogonal complement of C(y), written C(y). In particular, write any vector x as x = x 0 + x 1 with x 0 C(y) and x 1 C(y). Then Hx = x 0 + x 1, which is the reflection in the plane. It is easily seen that each Householder matrix is symmetric and is its own inverse, i.e., H = H and HH = I. (1) In particular, H is an orthogonal matrix. In the Householder QR decomposition, Q is the product of r specific Householder transformations H j, j = 1,..., r so that Q = H 1 H r. We now define the matrices H j. The key idea is to incorporate a previous orthonormal basis, e.g., the columns of the identity matrix I = [e 1,, e n ]. We will also use the matrix E j n e k e k k=j which, as j increases, orthogonally projects vectors into smaller and smaller subspaces. (In triangulation discussions of the Householder method, the use of E j corresponds to looking at smaller and smaller submatrices.) For a collection of nonzero vectors {w 1,..., w r } that we will specify later, define the Householder transformations H j I 2 v jv j v j 2, v j = w j w j e j. These particular Householder transforms have a couple of peculiar properties: H j w j = w j e j, H j e j = (1/ w j )w j. (2) These results are not difficult and follow from basic algebraic manipulations. The motivating idea is that H j rotates the vector w j into the direction of the orthonormal basis element e j, an idea that can be used to triangularize a matrix. Specifically, we define w 1 (1/ x 1 )x 1 and for j = 1,..., r 1, w j+1 = E j+1 H j H 1 x j+1, 3

with the understanding that, since X has rank r, we will see p r vectors with w j+1 = 0 that we exclude from the list. As indicated from our later proof of equivalence with G-S, w j+1 = 0 indicates x j+1 C(x 1,..., x j ). Moreover, the very definition of v j+1 requires w j+1 0. Note that the matrices H j+1 are defined recursively. It is important to show that for k < j, This follows directly from orthonormality of the e j s, the fact that H j e k = e k. (3) e kv j = e k[w j w j e j ] = e ke j H j 1 H 1 x j e ke j ( w j ) = 0 0 and that H j e k = Ie k v j v je k (2/ v j 2 ). Moreover, it follows immediately from (3) that We can now define our orthonormal basis as H 1 H j e k = H 1 H k e k. (4) o j H 1 H j e j = H 1 H r e j, for j = 1,..., r. It is easy to see that these vectors are orthornormal. Using equations (1) { 1 i = j o io j = [H 1 H r e i ] [H 1 H r e j ] = e i H r H 1 H 1 H r e j = e ie j = 0 if i j. The problem is showing that, in the full rank case say, o s is in the space spanned by x 1,..., x s, s = 1,..., p. By definition and equations (2), o 1 is in the space spanned by x 1. The key fact in an induction proof is that H 1 H j e j is equivalent to performing the G-S algorithm on x j. We demonstrate for j = 1,..., r 1 that o j+1 H 1 H j+1 e j+1 is the same as the inductive step of G-S. By (2), We will show that o j+1 H 1 H j+1 e j+1 = H 1 H j w j+1 (1/ w j+1 ). H 1 H j w j+1 = z j+1 where z j+1 is defined in the G-S algorithm. In G-S, this vector would be normalized by dividing it by H 1 H j w j+1 but in the Householder method the normalization occurred through dividing w j+1 by w j+1. These normalizations agree because w j+1 = H 1 H j w j+1, since Householder transformations are rotations that do not change a vector s length. Starting with the definition of w j+1 and assuming that o 1,..., o j are appropriate orthonormal vectors H 1 H j w j+1 = H 1 H j E j+1 H j H 1 x j+1 4

= H 1 H j (I ) j e k e k) H j H 1 x j+1 = H 1 H j H j H 1 x j+1 = x j+1 j o k o kx j+1. j H 1 H j e k e kh j H 1 x j+1 This is exactly the vector z j+1 from the G-S algorithm. Normalizing it gives o j+1. References Christensen, Ronald (2011). Plane Answers to Complex Questions: The Theory of Linear Models, 4th Edition. Springer-Verlag, New York. Harville, David A. (1997). Matrix Algebra From a Statistician s Perspective. Springer, New York. LaMotte, Lynn Roy (2014). The Gram-Schmidt Construction as a Basis for Linear Models, The American Statistician, 68, 52-55. 5