MATH 532: Linear Algebra

Similar documents
Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Elementary linear algebra

October 25, 2013 INNER PRODUCT SPACES

orthogonal relations between vectors and subspaces Then we study some applications in vector spaces and linear systems, including Orthonormal Basis,

HOMEWORK PROBLEMS FROM STRANG S LINEAR ALGEBRA AND ITS APPLICATIONS (4TH EDITION)

Linear Algebra Massoud Malek

Orthogonal Transformations

Math Linear Algebra II. 1. Inner Products and Norms

Lecture notes: Applied linear algebra Part 1. Version 2

Matrix decompositions

DS-GA 1002 Lecture notes 0 Fall Linear Algebra. These notes provide a review of basic concepts in linear algebra.

A Brief Outline of Math 355

MATH 167: APPLIED LINEAR ALGEBRA Chapter 3

1. General Vector Spaces

NORMS ON SPACE OF MATRICES

SUMMARY OF MATH 1600

Chapter 1. Preliminaries. The purpose of this chapter is to provide some basic background information. Linear Space. Hilbert Space.

Numerical Linear Algebra

MATH 240 Spring, Chapter 1: Linear Equations and Matrices

Numerical Methods. Elena loli Piccolomini. Civil Engeneering. piccolom. Metodi Numerici M p. 1/??

Math Camp Lecture 4: Linear Algebra. Xiao Yu Wang. Aug 2010 MIT. Xiao Yu Wang (MIT) Math Camp /10 1 / 88

MAT Linear Algebra Collection of sample exams

Linear Algebra in Actuarial Science: Slides to the lecture

Finite-dimensional spaces. C n is the space of n-tuples x = (x 1,..., x n ) of complex numbers. It is a Hilbert space with the inner product

Applied Mathematics 205. Unit II: Numerical Linear Algebra. Lecturer: Dr. David Knezevic

The following definition is fundamental.

Review problems for MA 54, Fall 2004.

MATH 167: APPLIED LINEAR ALGEBRA Least-Squares

Basic Elements of Linear Algebra

SYLLABUS. 1 Linear maps and matrices

Linear Algebra Highlights

AMS526: Numerical Analysis I (Numerical Linear Algebra)

The QR Factorization

Matrix Theory. A.Holst, V.Ufnarovski

Lecture 4 Orthonormal vectors and QR factorization

Preliminary/Qualifying Exam in Numerical Analysis (Math 502a) Spring 2012

Numerical Linear Algebra

Problem Set (T) If A is an m n matrix, B is an n p matrix and D is a p s matrix, then show

Index. book 2009/5/27 page 121. (Page numbers set in bold type indicate the definition of an entry.)

Linear Algebra. Paul Yiu. Department of Mathematics Florida Atlantic University. Fall A: Inner products

Ir O D = D = ( ) Section 2.6 Example 1. (Bottom of page 119) dim(v ) = dim(l(v, W )) = dim(v ) dim(f ) = dim(v )

EECS 275 Matrix Computation

AM 205: lecture 8. Last time: Cholesky factorization, QR factorization Today: how to compute the QR factorization, the Singular Value Decomposition

Math 350 Fall 2011 Notes about inner product spaces. In this notes we state and prove some important properties of inner product spaces.

LINEAR ALGEBRA SUMMARY SHEET.

Numerical Linear Algebra Chap. 2: Least Squares Problems

MAT 2037 LINEAR ALGEBRA I web:

Review of Some Concepts from Linear Algebra: Part 2

1 9/5 Matrices, vectors, and their applications

Linear Analysis Lecture 16

The value of a problem is not so much coming up with the answer as in the ideas and attempted ideas it forces on the would be solver I.N.

Lecture 2: Linear Algebra Review

ECE 275A Homework #3 Solutions

Linear Algebra. Session 12

Math 4A Notes. Written by Victoria Kala Last updated June 11, 2017

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM

MATH 22A: LINEAR ALGEBRA Chapter 4

Orthonormal Bases; Gram-Schmidt Process; QR-Decomposition

Chapter 3 Transformations

Linear Algebra. Min Yan

Orthonormal Transformations

Math 291-2: Lecture Notes Northwestern University, Winter 2016

Orthonormal Transformations and Least Squares

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Math 407: Linear Optimization

Math Linear Algebra Final Exam Review Sheet

Chapter 4 Euclid Space

MATH 350: Introduction to Computational Mathematics

Typical Problem: Compute.

This can be accomplished by left matrix multiplication as follows: I

OHSx XM511 Linear Algebra: Solutions to Online True/False Exercises

Math113: Linear Algebra. Beifang Chen

Throughout these notes we assume V, W are finite dimensional inner product spaces over C.

Lecture Notes for Inf-Mat 3350/4350, Tom Lyche

LECTURE NOTES ELEMENTARY NUMERICAL METHODS. Eusebius Doedel

Solution of Linear Equations

IMPORTANT DEFINITIONS AND THEOREMS REFERENCE SHEET

Lecture Notes 1: Vector spaces

Linear Algebra Primer

Linear Models Review

2. Review of Linear Algebra

Lecture: Linear algebra. 4. Solutions of linear equation systems The fundamental theorem of linear algebra

A Review of Linear Algebra

which arises when we compute the orthogonal projection of a vector y in a subspace with an orthogonal basis. Hence assume that P y = A ij = x j, x i

MAT 610: Numerical Linear Algebra. James V. Lambers

Numerical Methods - Numerical Linear Algebra

Applied Linear Algebra in Geoscience Using MATLAB

GQE ALGEBRA PROBLEMS

Inner Product and Orthogonality

Cheat Sheet for MATH461

Normed & Inner Product Vector Spaces

Eigenvalues and Eigenvectors

Mathematics Department Stanford University Math 61CM/DM Inner products

Matrix Algebra for Engineers Jeffrey R. Chasnov

Lecture Summaries for Linear Algebra M51A

Applied Linear Algebra

NOTES on LINEAR ALGEBRA 1

Algebra II. Paulius Drungilas and Jonas Jankauskas

Inner product spaces. Layers of structure:

Lecture 3: QR-Factorization

Transcription:

MATH 532: Linear Algebra Chapter 5: Norms, Inner Products and Orthogonality Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Spring 2015 fasshauer@iit.edu MATH 532 1

Outline 1 Vector Norms 2 Matrix Norms 3 Inner Product Spaces 4 Orthogonal Vectors 5 Gram Schmidt Orthogonalization & QR Factorization 6 Unitary and Orthogonal Matrices 7 Orthogonal Reduction 8 Complementary Subspaces 9 Orthogonal Decomposition 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 2

Vector Norms [0] Vector Norms 1 Vector Norms 2 Matrix Norms Definition 3 Inner Product Spaces Let x, y R n (C n ). Then 4 Orthogonal Vectors x T y = 5 Gram Schmidt Orthogonalization & QR Factorization 6 Unitary and Orthogonal Matrices 7 Orthogonal Reduction 8 Complementary Subspaces x y = n x i y i i=1 n x i y i i=1 R C is called the standard inner product for R n (C n ). 9 Orthogonal Decomposition 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 4

Vector Norms Definition Let V be a vector space. A function : V R 0 is called a norm provided for any x, y V and α R 1 x 0 and x = 0 if and only if x = 0, 2 αx = α x, 3 x + y x + y. Remark The inequality in (3) is known as the triangle inequality. fasshauer@iit.edu MATH 532 5

Vector Norms Remark Any inner product, induces a norm via (more later) x = x, x. We will show that the standard inner product induces the Euclidean norm (cf. length of a vector). Remark Inner products let us define angles via cos θ = x T y x y. In particular, x, y are orthogonal if and only if x T y = 0. fasshauer@iit.edu MATH 532 6

Vector Norms Example Let x R n and consider the Euclidean norm x 2 = x T x ( n ) 1/2 = xi 2. We show that 2 is a norm. We do this for the real case, but the complex case goes analogously. 1 Clearly, x 2 0. Also, i=1 x 2 = 0 x 2 2 = 0 n xi 2 = 0 x i = 0, i = 1,..., n, i=1 x = 0. fasshauer@iit.edu MATH 532 7

Vector Norms Example (cont.) 2 We have ( n ) 1/2 ( n ) 1/2 αx 2 = (αx i ) 2 = α xi 2 = α x 2. i=1 i=1 3 To establish (3) we need Lemma Let x, y R n. Then x T y x 2 y 2. (Cauchy Schwarz Bunyakovsky) Moreover, equality holds if and only if y = αx with α = x T y x 2. 2 fasshauer@iit.edu MATH 532 8

Vector Norms Motivation for Proof of Cauchy Schwarz Bunyakovsky As already alluded to above, the angle θ between two vectors a and b is related to the inner product by cos θ = at b a b. Using trigonometry as in the figure, the projection of a onto b is then a cos θ b b = a at b b a b b = at b b 2 b. Now, we let y = a and x = b, so that the projection of y onto x is given by αx, where α = x T y x 2. fasshauer@iit.edu MATH 532 9

Vector Norms Proof of Cauchy Schwarz Bunyakovsky We know that y αx 2 2 0 since it s (the square of) a norm. Therefore, 0 y αx 2 2 = (y αx)t (y αx) = y T y 2αx T y + α 2 x T x = y T y 2 x ( T y x x 2 x T T y ) 2 y + x 4 x}{{} T x = x 2 2 ( x = y 2 T 2 y ) 2 x 2. 2 This implies ( x T y) 2 x 2 2 y 2 2, and the Cauchy Schwarz Bunyakovsky inequality follows by taking square roots. fasshauer@iit.edu MATH 532 10

Vector Norms Proof (cont.) Now we look at the equality claim. = : Let s assume that x T y = x 2 y 2. But then the first part of the proof shows that y αx 2 = 0 so that y = αx. = : Let s assume y = αx. Then x T y = x T (αx) = α x 2 2 so that we have equality. x 2 y 2 = x 2 αx 2 = α x 2 2, fasshauer@iit.edu MATH 532 11

Vector Norms Example (cont.) 3 Now we can show that 2 satisfies the triangle inequality: x + y 2 2 = (x + y)t (x + y) = x T x }{{} = x 2 2 +x T y + y T x }{{} =x T y = x 2 2 + 2x T y + y 2 2 x 2 2 + 2 x T y + y 2 2 + y T y }{{} = y 2 2 CSB x 2 2 + 2 x 2 y 2 + y 2 2 = ( x 2 + y 2 ) 2. Now we just need to take square roots to have the triangle inequality. fasshauer@iit.edu MATH 532 12

Vector Norms Lemma Let x, y R n. Then we have the backward triangle inequality x y x y. Proof We write x = x y + y tri.ineq. x y + y. But this implies x y x y. fasshauer@iit.edu MATH 532 13

Vector Norms Proof (cont.) Switch the roles of x and y to get y x y x ( x y ) x y. Together with the previous inequality we have x y x y. fasshauer@iit.edu MATH 532 14

Other common norms Vector Norms l 1 -norm (or taxi-cab norm, Manhattan norm): n x 1 = x i i=1 l -norm (or maximum norm, Chebyshev norm): x = max 1 i n x i l p -norm: ( n ) 1/p x p = x i p i=1 Remark In the homework you will use Hölder s and Minkowski s inequalities to show that the p-norm is a norm. fasshauer@iit.edu MATH 532 15

Vector Norms Remark We now show that x = lim p x p. Let s use tildes to mark all components of x that are maximal, i.e.. x 1 = x 2 =... = x k = max 1 i n x i. The remaining components are then x k+1,..., x n. This implies that x i x 1 < 1, for i = k + 1,..., n. fasshauer@iit.edu MATH 532 16

Vector Norms Remark (cont.) Now ( n ) 1/p x p = x i p i=1 = x 1 k + x k+1 x }{{ 1 } <1 p +... + x n p x }{{} 1 <1 Since the terms inside the parentheses except for k go to 0 for p, ( ) 1/p 1 for p. And so x p x 1 = max 1 i n x i = x. 1/p. fasshauer@iit.edu MATH 532 17

Vector Norms Figure: Unit balls in R 2 for the l 1, l 2 and l norms. Note that B 1 B 2 B since, e.g., ( ) 2 2 2 = ( ) 2 2, 2 1 2 = 2 + 1 2 = 1, 2 1 2 2 ( ) 2 so that 1 2 2 2 In fact, we have in general (similar to HW) ( 2 2 2 2 x 1 x 2 x, for any x R n. ( ) 2 2 2 2 = 2, 2 ) ( ) 2. 2 fasshauer@iit.edu MATH 532 18 2 2 2

Vector Norms Norm equivalence Definition Two norms and on a vector space V are called equivalent if there exist constants α, β such that Example α x x β for all x( 0) V. 1 and 2 are equivalent since from above x 1 x 2 and also x 1 n x 2 (see HW) so that Remark α = 1 x 1 x 2 n = β. In fact, all norms on finite-dimensional vector spaces are equivalent. fasshauer@iit.edu MATH 532 19

Matrix Norms [0] Matrix norms are special norms they will satisfy one additional property. 1 Vector Norms This property should help us measure AB for two matrices A, B of 2 Matrix Norms appropriate sizes. We 3 look Inner Product at the Spaces simplest matrix norm, the Frobenius norm, defined for A R m,n : 4 Orthogonal Vectors 1/2 ( m n m ) 1/2 5 Gram Schmidt Orthogonalization & QR Factorization A F = a ij 2 = A i 2 2 6 Unitary and Orthogonal Matricesi=1 j=1 7 Orthogonal Reduction 8 Complementary Subspaces 9 Orthogonal Decomposition n = A j 2 2 j=1 1/2 = i=1 trace(a T A), i.e., the Frobenius norm is just a 2-norm for the vector that contains all elements 10 Singular Value of the Decomposition matrix. 11 Orthogonal Projections fasshauer@iit.edu MATH 532 21

Now Matrix Norms m Ax 2 2 = A i x 2 CSB i=1 m A i 2 2 x 2 2 i=1 } {{ } = A 2 F so that Ax 2 A F x 2. We can generalize this to matrices, i.e., we have AB F A F B F, which motivates us to require this submultiplicativity for any matrix norm. fasshauer@iit.edu MATH 532 22

Matrix Norms Definition A matrix norm is a function from the set of all real (or complex) matrices of finite size into R 0 that satisfies 1 A 0 and A = 0 if and only if A = O (a matrix of all zeros). 2 αa = α A for all α R. 3 A + B A + B (requires A, B to be of same size). 4 AB A B (requires A, B to have appropriate sizes). Remark This definition is usually too general. In addition to the Frobenius norm, most useful matrix norms are induced by a vector norm. fasshauer@iit.edu MATH 532 23

Matrix Norms Induced matrix norms Theorem Let (m) and (n) be vector norms on R m and R n, respectively, and let A be an m n matrix. Then A = max x (n) =1 Ax (m) is a matrix norm called the induced matrix norm. Remark Here the vector norm could be any vector norm. In particular, any p-norm. For example, we could have A 2 = max x 2,(n) =1 Ax 2,(m). To keep notation simple we often drop indices. fasshauer@iit.edu MATH 532 24

Matrix Norms Proof 1 A 0 is obvious since this holds for the vector norm. It remains to show that A = 0 if and only if A = O. Assume A = O, then A = max Ax x =1 }{{} = 0. So now consider A O. We need to show that A > 0. There must exist a column of A that is not 0. We call this column A k and take x = e k. Then A = max Ax ek =1 Ae k = A k > 0 x =1 since A k 0. =0 fasshauer@iit.edu MATH 532 25

Matrix Norms Proof (cont.) 2 Using the corresponding property for the vector norm we have αa = max αax = α max Ax = α A. 3 Also straightforward (based on the triangle inequality for the vector norm) A + B = max (A + B)x = max Ax + Bx max ( Ax + Bx ) = max Ax + max Bx = A + B. fasshauer@iit.edu MATH 532 26

Matrix Norms Proof (cont.) 4 First note that and so Therefore Ax max Ax = max x =1 x 0 x Ax A = max Ax = max x =1 x 0 x Ax x. But then we also have AB A B since Ax A x. (1) AB = max ABx = ABy (for some y with y = 1) x =1 (1) A By (1) A B y. }{{} =1 fasshauer@iit.edu MATH 532 27

Matrix Norms Remark One can show (see HW) that if A is invertible min Ax = 1 x =1 A 1. The induced matrix norm can be interpreted geometrically: A : the most a vector on the unit sphere can be stretched when transformed by A. 1 A 1 : the most a vector on the unit sphere can be shrunk when transformed by A. fasshauer@iit.edu MATH 532 28

Matrix 2-norm Matrix Norms Theorem Let A be an m n matrix. Then 1 A 2 = max x =1 Ax 2 = λ max. 2 A 1 2 = 1 = 1. min x =1 Ax 2 λmin where λ max and λ min are the largest and smallest eigenvalues of A T A, respectively. Remark We also have λmax = σ 1, the largest singular value of A, λmin = σ n, the smallest singular value of A. fasshauer@iit.edu MATH 532 29

Matrix Norms Proof We will show only (1), the largest singular value ((2) goes similarly). The idea is to solve a constrained optimization problem (as in calculus), i.e., maximize f (x) = Ax 2 2 = (Ax)T Ax subject to g(x) = x 2 2 = x T x = 1. We do this by introducing a Lagrange multiplier λ and define h(x, λ) = f (x) λg(x) = x T A T Ax λx T x. fasshauer@iit.edu MATH 532 30

Matrix Norms Proof (cont.) Necessary and sufficient (since quadratic) condition for maximum: h x i = 0, i = 1,..., n, g(x) = 1 ( ) x T A T Ax λx T x = x T A T Ax + x T A T A x λ x T x λx T x x i x i x i x i x i = 2e T i A T Ax 2λe T i x ) = 2 ((A T Ax) i (λx) i, i = 1,..., n. Together this yields A T Ax λx = 0 ( ) A T A λi x = 0, so that λ must be an eigenvalue of A T A (since g(x) = x T x = 1 ensures x 0). fasshauer@iit.edu MATH 532 31

Matrix Norms Proof (cont.) In fact, as we now show, λ is the maximal eigenvalue. First, A T Ax = λx = x T A T Ax = λx T x = λ so that And then Ax 2 = x T A T Ax = λ. A 2 = max x 2 =1 Ax 2 = max x 2 2 =1 Ax 2 = max λ = λ max. fasshauer@iit.edu MATH 532 32

Matrix Norms Special properties of the 2-norm 1 A 2 = max x 2 =1 max y 2 =1 y T Ax 2 A 2 = A T 2 3 A T A 2 = A 2 2 = AAT 2 ( ) 4 A O = max { A O B 2, B 2 } 5 U T AV 2 = A 2 provided UU T = I and V T V = I (orthogonal Remark matrices). The proof is a HW problem. fasshauer@iit.edu MATH 532 33

Matrix Norms Matrix 1-norm and -norm Theorem Let A be an m n matrix. Then we have 1 the column sum norm 2 and the row sum norm Remark A 1 = max x 1 =1 Ax 1 = max j=1,...,n A = max Ax = x =1 max i=1,...,m m a ij, i=1 n a ij. We know these are norms, so what we need to do is verify that the formulas hold. We will show (1). fasshauer@iit.edu MATH 532 34 j=1

Matrix Norms Proof First we look at Ax 1. Ax 1 = m (Ax) i = i=1 reg. = j=1 m i=1 j=1 m A i x = i=1 n a ij x j [ ] [ n m x j a ij i=1 m i=1 max j=1,...,n n a ij x j j=1 ] m n a ij x j. Since we actually need to look at Ax 1 for x 1 = 1 we note that x 1 = n j=1 x j and therefore have Ax 1 max j=1,...,n m a ij. fasshauer@iit.edu MATH 532 35 i=1 i=1 j=1

Matrix Norms Proof (cont.) We even have equality since for x = e k, where k is the index such that A k has maximum column sum, we get Ax 1 = Ae k 1 = A k 1 = = max j=1...,n m a ik i=1 m a ik i=1 due to our choice of k. Since e k 1 = 1 we indeed have the desired formula. fasshauer@iit.edu MATH 532 36

Inner Product Spaces [0] Definition A1 general Vector Norms inner product in a real (complex) vector space V is a symmetric (Hermitian) bilinear form, : V V R (C), i.e., 2 Matrix Norms 1 x, x R 0 with x, x = 0 if and only if x = 0. 3 Inner Product Spaces 2 x, αy = α x, y for all scalars α. 4 Orthogonal Vectors 3 x, y + z = x, y + x, z. 5 Gram Schmidt Orthogonalization & QR Factorization 4 x, y = y, x (or x, y = y, x if complex). 6 Unitary and Orthogonal Matrices Remark 7 Orthogonal Reduction The following two properties (providing bilinearity) are implied (see HW) 8 Complementary Subspaces 9 Orthogonal Decomposition 10 Singular Value Decomposition αx, y = α x, y x + y, z = x, z + y, z. 11 Orthogonal Projections fasshauer@iit.edu MATH 532 38

Inner Product Spaces As before, any inner product induces a norm via =,. One can show (analogous to the Euclidean case) that is a norm. In particular, we have a general Cauchy Schwarz Bunyakovsky inequality x, y x y. fasshauer@iit.edu MATH 532 39

Inner Product Spaces Example 1 x, y = x T y (or x y), the standard inner product for R n (C n ). 2 For nonsingular matrices A we get the A-inner product on R n, i.e., x, y = x T A T Ay with x A = x, x = x T A T Ax = Ax 2. 3 If V = R m n (or C m n ) then we get the standard inner product for matrices, i.e., A, B = trace(a T B) (or trace(a B)) with A = A, A = trace(a T A) = A F. fasshauer@iit.edu MATH 532 40

Inner Product Spaces Remark In the infinite-dimensional setting we have, e.g., for f, g continuous functions on (a, b) b f, g = f (t)g(t)dt a with f = ( 1/2 b f, f = (f (t)) dt) 2. a fasshauer@iit.edu MATH 532 41

Parallelogram identity Inner Product Spaces In any inner product space the so-called parallelogram identity holds, i.e., ( x + y 2 + x y 2 = 2 x 2 + y 2). (2) This is true since x + y 2 + x y 2 = x + y, x + y + x y, x y = x, x + x, y + y, x + y, y + x, x x, y y, x + y, y ( = 2 x, x + 2 y, y = 2 x 2 + y 2). fasshauer@iit.edu MATH 532 42

Inner Product Spaces Polarization identity The following theorem shows that we not only get a norm from an inner product (i.e., every Hilbert space is a Banach space), but if the parallelogram identity holds then we can get an inner product from a norm (i.e., a Banach space becomes a Hilbert space). Theorem Let V be a real vector space with norm. If the parallelogram identity (2) holds then x, y = 1 4 is an inner product on V. ( x + y 2 x y 2) (3) fasshauer@iit.edu MATH 532 43

Inner Product Spaces Proof We need to show that all four properties of a general inner product hold. 1 Nonnegativity: x, x = 1 4 ( x + x 2 x x 2) = 1 4 2x 2 = x 2 0. Moreover, x, x > 0 if and only if x = 0 since x, x = x 2. 4 Symmetry: x, y = y, x is clear since x y = y x. fasshauer@iit.edu MATH 532 44

Inner Product Spaces Proof (cont.) 3 Additivity: The parallelogram identity implies and x + y 2 + x + z 2 = 1 2 x y 2 + x z 2 = 1 2 Subtracting (5) from (4) we get ( x + y + x + z 2 + y z 2). (4) ( x y + x z 2 + z y 2). (5) x + y 2 x y 2 + x + z 2 x z 2 = 1 ( 2x + y + z 2 2x y z 2). 2 (6) fasshauer@iit.edu MATH 532 45

Inner Product Spaces Proof (cont.) The specific form of the polarized inner product implies x, y + x, z = 1 ( x + y 2 x y 2 + x + z 2 x z 2) 4 (6) = 1 ( 2x + y + z 2 2x y z 2) 8 ( = 1 2 x + y + z 2 2 x y + z ) 2 2 polarization Setting z = 0 in (7) yields since x, z = 0. = 2 x, y + z. (7) 2 x, y = 2 x, y 2 (8) fasshauer@iit.edu MATH 532 46

Inner Product Spaces Proof (cont.) To summarize, we have x, y + x, z = 2 x, y + z. (7) 2 and x, y = 2 x, y. (8) 2 Since (8) is true for any y V we can, in particular, set y = y + z so that we have x, y + z = 2 x, y + z 2. This, however, is the right-hand side of (7) so that we end up with as desired. x, y + z = x, y + x, z, fasshauer@iit.edu MATH 532 47

Inner Product Spaces Proof (cont.) 2 Scalar multiplication: To show x, αy = α x, y for integer α we can just repeatedly apply the additivity property just proved. From this we can get the property for rational α as follows. We let α = β γ with integer β, γ 0 so that βγ x, y = γx, βy = γ 2 x, β γ y. Dividing by γ 2 we get β γ x, y = x, β γ y. fasshauer@iit.edu MATH 532 48

Inner Product Spaces Proof (cont.) Finally, for real α we use the continuity of the norm function (see HW) which implies that our inner product, also is continuous. Now we take a sequence {α n } of rational numbers such that α n α for n and have by continuity x, α n y x, αy as n. fasshauer@iit.edu MATH 532 49

Inner Product Spaces Theorem The only vector p-norm induced by an inner product is the 2-norm. Remark Since many problems are more easily dealt with in inner product spaces (since we then have lengths and angles, see next section) the 2-norm has a clear advantage over other p-norms. fasshauer@iit.edu MATH 532 50

Inner Product Spaces Proof We know that the 2-norm does induce an inner product, i.e., x 2 = x T x. Therefore we need to show that it doesn t work for p 2. We do this by showing that the parallelogram identity x + y 2 + x y 2 = 2 ( x 2 + y 2) fails for p 2. We will do this for 1 p <. You will work out the case p = in a HW problem. fasshauer@iit.edu MATH 532 51

Inner Product Spaces Proof (cont.) All we need is a counterexample, so we take x = e 1 and y = e 2 so that ( n ) 2/p x + y 2 p = e 1 + e 2 2 p = [e 1 + e 2 ] i p = 2 2/p i=1 and, similarly x y 2 p = e 1 e 2 2 p = 2 2/p. Together, the left-hand side of the parallelogram identity is 2 ( 2 2/p) = 2 2/p+1. fasshauer@iit.edu MATH 532 52

Inner Product Spaces Proof (cont.) For the right-hand side of the parallelogram identity we calculate x 2 p = e 1 2 p = 1 = e 2 2 p = y 2 p, so that the right-hand side comes out to 4. Finally, we have 2 2/p+1 = 4 2 p + 1 = 2 2 p = 1 or p = 2. fasshauer@iit.edu MATH 532 53

Orthogonal Vectors [0] Orthogonal Vectors 1 Vector Norms 2 Matrix Norms We will now work in a general inner product space V with induced norm 3 Inner Product Spaces =,. 4 Orthogonal Vectors 5 Gram Schmidt Orthogonalization & QR Factorization Definition 6 Unitary and Orthogonal Matrices Two vectors x, y V are called orthogonal if 7 Orthogonal Reduction 8 Complementary Subspaces We often use the notation x y. 9 Orthogonal Decomposition x, y = 0. 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 55

Orthogonal Vectors In the HW you will prove the Pythagorean theorem for the 2-norm and standard inner product x T y, i.e., x 2 + y 2 = x y 2 x T y = 0. Moreover, the law of cosines states so that x y 2 = x 2 + y 2 2 x y cos θ, cos θ = x 2 + y 2 x y 2 2 x y Pythagoras = 2x T y 2 x y. This motivates our general definition of angles: Definition Let x, y V. The angle between x and y is defined via cos θ = x, y x y, θ [0, π]. fasshauer@iit.edu MATH 532 56

Orthogonal Vectors Orthonormal sets Definition A set {u 1, u 2,..., u n } V is called orthonormal if u i, u j = δ ij (Kronecker delta). Theorem Every orthonormal set is linearly independent. Corollary Every orthonormal set of n vectors from an n-dimensional vector space V is an orthonormal basis for V. fasshauer@iit.edu MATH 532 57

Orthogonal Vectors Proof (of the theorem) We want to show linear independence, i.e., that n α j u j = 0 = α j = 0, j = 1,..., n. j=1 To see this is true we take the inner product with u i : u i, n j=1 n α j u j = u i, 0 j=1 α j u i, u j }{{} = 0 α i = 0. =δ ij Since i was arbitrary this holds for all i = 1,..., n, and we have linear independence. fasshauer@iit.edu MATH 532 58

Orthogonal Vectors Example The standard orthonormal basis of R n is given by {e 1, e 2,..., e n }. Using this basis we can express any x R n as x = x 1 e 1 + x 2 e 2 +... + x n e n, we get a coordinate expansion of x. fasshauer@iit.edu MATH 532 59

Orthogonal Vectors In fact, any other orthonormal basis provides just as simple a representation of x; Consider the orthonormal basis B = {u 1, u 2,..., u n } and assume x = n α j u j j=1 for some appropriate scalars α j. To find these expansion coefficients α j we take the inner product with u i, i.e., u i, x = n j=1 α j u i, u j }{{} =δ ij = α i. fasshauer@iit.edu MATH 532 60

Orthogonal Vectors We therefore have proved Theorem Let {u 1, u 2,..., u n } be an orthonormal basis for an inner product space V. Then any x V can be written as x = n x, u i u i. j=1 This is a (finite) Fourier expansion with Fourier coefficients x, u i. fasshauer@iit.edu MATH 532 61

Orthogonal Vectors Remark The classical (infinite-dimensional) Fourier series for continuous functions on ( π, π) uses the orthogonal (but not yet orthonormal) basis {1, sin t, cos t, sin 2t, cos 2t,..., } and the inner product f, g = π π f (t)g(t)dt. fasshauer@iit.edu MATH 532 62

Orthogonal Vectors Example Consider the basis 1 0 1 B = {u 1, u 2, u 3 } = 0, 1, 0. 1 0 1 It is clear by inspection that B is an orthogonal subset of R 3, i.e., using the Euclidean inner product, we have u T i u j = 0, i, j = 1, 2, 3, i j. We can obtain an orthonormal basis by normalizing the vectors, i.e., by computing v i = u i u i 2, i = 1, 2, 3. This yields v 1 = 1 1 0 0, v 2 = 1, v 3 = 1 1 0. 2 1 0 2 1 fasshauer@iit.edu MATH 532 63

Orthogonal Vectors Example (cont.) The Fourier expansion of x = ( 1 2 3 ) T is given by x = 3 i=1 (x T v i ) v i = 4 1 0 1 2 0 + 2 1 2 1 1 2 0 2 1 0 2 1 2 0 1 = 0 + 2 + 0. 2 0 1 fasshauer@iit.edu MATH 532 64

Gram Schmidt Orthogonalization & QR Factorization [0] 1 Vector Norms 2 Matrix Norms We 3 want to convert an arbitrary basis {x 1, x 2,..., x n } of V to an Inner Product Spaces orthonormal basis {u 1, u 2,..., u n }. 4 Orthogonal Vectors Idea: construct u 1, u 2,..., u n successively so that {u 1, u 2,..., u k } is an ON basis for span{x 1, x 2,..., x k }, k = 1,..., n. 5 Gram Schmidt Orthogonalization & QR Factorization 6 Unitary and Orthogonal Matrices 7 Orthogonal Reduction 8 Complementary Subspaces 9 Orthogonal Decomposition 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 66

Gram Schmidt Orthogonalization & QR Factorization Construction k = 1: u 1 = x 1 x 1. k = 2: Consider the projection of x 2 onto u 1, i.e., u 1, x 2 u 1. Then and v 2 = x 2 u 1, x 2 u 1 u 2 = v 2 v 2. fasshauer@iit.edu MATH 532 67

Gram Schmidt Orthogonalization & QR Factorization In general, consider {u 1,..., u k } as a given ON basis for span{x 1,..., x k }. Use the Fourier expansion to express x k+1 with respect to {u 1,..., u k+1 }: k+1 x k+1 = u i, x k+1 u i x k+1 = i=1 k u i, x k+1 u i + u k+1, x k+1 u k+1 i=1 u k+1 = x k+1 k i=1 u i, x k+1 u i u k+1, x k+1 This vector, however is not yet normalized. = v k+1 u k+1, x k+1 fasshauer@iit.edu MATH 532 68

Gram Schmidt Orthogonalization & QR Factorization We now want u k+1 = 1, i.e., v k+1 u k+1, x k+1, v k+1 u k+1, x k+1 = 1 u k+1, x k+1 v k+1 = 1 = v k+1 = x k+1 Therefore k u i, x k+1 u i = u k+1, x k+1. i=1 u k+1, x k+1 = ± x k+1 k u i, x k+1 u i. Since the factor ±1 does not change the span, nor orthogonality, nor normalization we can pick the positive sign. i=1 fasshauer@iit.edu MATH 532 69

Gram Schmidt Orthogonalization & QR Factorization Gram Schmidt algorithm Summarizing, we have u 1 = x 1 x 1, k 1 v k = x k u i, x k u i, k = 2,..., n, u k = v k v k. i=1 fasshauer@iit.edu MATH 532 70

Gram Schmidt Orthogonalization & QR Factorization Using matrix notation to describe Gram Schmidt We will assume V R m (but this also works in the complex case). Let 0 U 1 =. R m 0 and for k = 2, 3,..., n let U k = ( u 1 u 2 u k 1 ) R m k 1. fasshauer@iit.edu MATH 532 71

Gram Schmidt Orthogonalization & QR Factorization Then and u T 1 x k U T k x u T 2 k = x k. u T k 1 x k u T 1 U k U T k x k = ( ) x k u T 2 u 1 u 2 u k 1 x k. u T k 1 x k k 1 k 1 = u i (u T i x k ) = (u T i x k )u i. i=1 i=1 fasshauer@iit.edu MATH 532 72

Gram Schmidt Orthogonalization & QR Factorization Now, Gram Schmidt says k 1 v k = x k (u T i x k )u i = x k U k U T k x k = i=1 ( I U k U T k ) x k, k = 1, 2,..., n, where the case k = 1 is also covered by the special definition of U 1. Remark U k U T k is a projection matrix, and I U ku T k projection. We will cover these later. is a complementary fasshauer@iit.edu MATH 532 73

Gram Schmidt Orthogonalization & QR Factorization QR Factorization (via Gram Schmidt) Consider an m n matrix A with rank(a) = n. We want to convert the set of columns of A, {a 1, a 2,..., a n } to an ON basis {q 1, q 2,..., q n } of R(A). From our discussion of Gram Schmidt we know q 1 = a 1 a 1, k 1 v k = a k q i, a k q i, k = 2,..., n, q k = v k v k. i=1 fasshauer@iit.edu MATH 532 74

Gram Schmidt Orthogonalization & QR Factorization We now rewrite as follows: a 1 = a 1 q 1 a k = q 1, a 2 q 1 +... + q k 1, a k q k 1 + v k q k, k = 2,..., n. We also introduce the new notation Then r 11 = a 1, r kk = v k, k = 2,..., n. A = ( ) a 1 a 2 a n r 11 q 1, a 2 q 1, a n = ( ) r 22 q 2, a n q 1 q 2 q n }{{}.... =Q } O {{ r nn } =R and we have the reduced QR factorization of A. fasshauer@iit.edu MATH 532 75

Gram Schmidt Orthogonalization & QR Factorization Remark The matrix Q is m n with orthonormal columns The matrix R is n n upper triangular with positive diagonal entries. The reduced QR factorization is unique (see HW). fasshauer@iit.edu MATH 532 76

Gram Schmidt Orthogonalization & QR Factorization Example 1 2 0 Find the QR factorization of the matrix A = 0 1 1. 1 0 1 q 1 = a 1 a 1 = 1 ( v 2 = a 2 1 0, r 11 = a 1 = 2 2 1 ) q T 1 a 2 2 1 = 1 2 0 = 0 2 1 = q 2 = v 2 v 2 = 1 1 1 3 1 q 1, q T 1 a 2 = 2 = 2 = r 12 2 1 1 1, v 2 = 3 = r 22 fasshauer@iit.edu MATH 532 77

Gram Schmidt Orthogonalization & QR Factorization Example (cont.) ( ) ( ) v 3 = a 3 q T 1 a 3 q 1 q T 2 a 3 q 2 with q T 1 a 3 = 1 2 = r 13, q T 2 a 3 = 0 = r 23 Thus 0 v 3 = 1 1 1 0 0 = 1 1 2, v 3 = 1 2 2 2 1 1 6 2 = r 33 So q 3 = v 3 v 3 = 1 1 2 6 1 fasshauer@iit.edu MATH 532 78

Gram Schmidt Orthogonalization & QR Factorization Example (cont.) Together we have Q = 1 3 1 1 2 6 1 0 3 6 2 1 1 6 1 2 3, R = 2 2 2 1 0 3 0 0 0 6 fasshauer@iit.edu MATH 532 79

Gram Schmidt Orthogonalization & QR Factorization Solving linear systems with the QR factorization Recall the use of the LU factorization to solve Ax = b. Now, A = QR implies Ax = b QRx = b. In the special case of a nonsingular n n matrix A the matrix Q is also n n with ON columns so that Q 1 = Q T (since Q T Q = I) and QRx = b Rx = Q T b. fasshauer@iit.edu MATH 532 80

Gram Schmidt Orthogonalization & QR Factorization Therefore we solve Ax = b by the following steps: 1 Compute A = QR. 2 Compute y = Q T b. 3 Solve the upper triangular system Rx = y. Remark This procedure is comparable to the three-step LU solution procedure. fasshauer@iit.edu MATH 532 81

Gram Schmidt Orthogonalization & QR Factorization The real advantage of the QR factorization lies in the solution of least squares problems. Consider Ax = b with A R m n and rank(a) = n (so that a unique least squares solution exists). We know that the least squares solution is given by the solution of the normal equations A T Ax = A T b. Using the QR factorization of A this becomes (QR) T QRx = (QR) T b R T Q }{{ T Q } Rx = R T Q T b =I R T Rx = R T Q T b. Now R is upper triangular with positive diagonal and therefore invertible. Therefore solving the normal equations corresponds to solving (cf. the previous discussion) Rx = Q T b. fasshauer@iit.edu MATH 532 82

Gram Schmidt Orthogonalization & QR Factorization Remark This is the same as the QR factorization applied to a square and consistent system Ax = b. Summary The QR factorization provides a simple and efficient way to solve least squares problems. The ill-conditioned matrix A T A is never computed. If it is required, then it can be computed from R as R T R (in fact, this is the Cholesky factorization) of A T A. fasshauer@iit.edu MATH 532 83

Gram Schmidt Orthogonalization & QR Factorization Modified Gram Schmidt There is still a problem with the QR factorization via Gram Schmidt: it is not numerically stable (see HW). A better but still not ideal approach is provided by the modified Gram Schmidt algorithm. Idea: rearrange the order of calculation, i.e., write the projection matrices k 1 U k U T k = u i u T i i=1 as a sum of rank-1 projections. fasshauer@iit.edu MATH 532 84

Gram Schmidt Orthogonalization & QR Factorization MGS Algorithm k=1: u 1 x 1 x 1, u j x j, j = 2,..., n for k = 2 : n E k = I u k 1 u T k 1 for j = k,..., n u j E k u j u k u k u k fasshauer@iit.edu MATH 532 85

Gram Schmidt Orthogonalization & QR Factorization Remark The MGS algorithm is theoretically equivalent to the GS algorithm, i.e., in exact arithmetic, but in practice it preserves orthogonality better. Most stable implementations of the QR factorization use Householder reflections or Givens rotations (more later). Householder reflections are also more efficient than MGS. fasshauer@iit.edu MATH 532 86

Unitary and Orthogonal Matrices Unitary and Orthogonal Matrices [0] 1 Vector Norms Definition 2 Matrix Norms A real (complex) n n matrix is called orthogonal (unitary) if its columns form an orthonormal basis for R n (C n ). 3 Inner Product Spaces 4 Orthogonal Vectors Theorem Let 5 UGram Schmidt be an orthogonal Orthogonalization & n QR Factorization n matrix. Then 1 U has orthonormal rows. 6 Unitary and Orthogonal Matrices 2 U 1 = U T. 7 Orthogonal Reduction 3 Ux 2 = x 2 for all x R n, i.e., U is an isometry. 8 Complementary Subspaces Remark 9 Orthogonal Decomposition Analogous properties for unitary matrices are formulated and proved in 10 [Mey00]. Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 88

Unitary and Orthogonal Matrices Proof 2 By definition U = ( u 1 u n ) has orthonormal columns, i.e., But U T U = I implies U T = U 1. u i u j u T i u j = δ ij ( ) U T U = δ ij U T U = I. 1 Therefore the statement about orthonormal rows follows from UU 1 = UU T = I. ij fasshauer@iit.edu MATH 532 89

Unitary and Orthogonal Matrices Proof (cont.) 3 To show that U is an isometry we assume U is orthogonal. Then, for any x R n Ux 2 2 = (Ux)T (Ux) = x T U }{{} T U x =I = x 2 2. fasshauer@iit.edu MATH 532 90

Unitary and Orthogonal Matrices Remark The converse of (3) is also true, i.e., if Ux 2 = x 2 for all x R n then U must be orthogonal. Consider x = e i. Then Ue i 2 2 = ut i u i (3) = e i 2 2 = 1, so the columns of U have norm 1. Moreover, for x = e i + e j (i j) we get U(e i + e j ) 2 2 = ut i u }{{} i =1 +u T i u j + u T j u i + u T j u j }{{} =1 (3) = e i + e j 2 2 = 2, so that u T i u j = 0 for i j and the columns of U are orthogonal. fasshauer@iit.edu MATH 532 91

Unitary and Orthogonal Matrices Example The simplest orthogonal matrix is the identity matrix I. Permutation matrices are orthogonal, e.g., 1 0 0 P = 0 0 1 0 1 0 In fact, for permutation matrices we even have P T = P so that P T P = P 2 = I. Such matrices are called involutary (see pretest). An orthogonal matrix can be viewed as a unitary matrix, but a unitary matrix may not be orthogonal. For example for A = 1 ( ) 1 i 2 1 i we have A A = AA = I, but A T A I AA T. fasshauer@iit.edu MATH 532 92

Unitary and Orthogonal Matrices Elementary Orthogonal Projectors Definition A matrix Q of the form Q = I uu T, u R n, u 2 = 1, is called an elementary orthogonal projection. Remark Note that Q is not an orthogonal matrix: Q T = (I uu T ) T = I uu T = Q. fasshauer@iit.edu MATH 532 93

Unitary and Orthogonal Matrices All projectors are idempotent, i.e., Q 2 = Q: Q T Q above = Q 2 = (I uu T )(I uu T ) = I 2uu T + u u }{{} T u =1 = (I uu T ) = Q. u T fasshauer@iit.edu MATH 532 94

Unitary and Orthogonal Matrices Geometric interpretation Consider x = (I Q)x + Qx and observe that (I Q)x Qx: ((I Q)x) T Qx = x T (I Q T )Qx Also, = x T (Q Q }{{ T Q } )x = 0. =Q (I Q)x = (uu T )x = u(u T x) span{u}. Therefore Qx u, the orthogonal complement of u. Also note that (u T x)u = u T x u }{{ 2, so that u T x is the length of } =1 the orthogonal projection of x onto span{u}. fasshauer@iit.edu MATH 532 95

Unitary and Orthogonal Matrices Summary (I Q)x span{u}, so I Q = uu T = P u is a projection onto span{u}. Qx u, so Q = I uu T = P u is a projection onto u. fasshauer@iit.edu MATH 532 96

Unitary and Orthogonal Matrices Remark Above we assumed that u 2 = 1. For an arbitrary vector v we get a unit vector u = Therefore, for general v P v = vvt v T v is a projection onto span{v}. P v = I P v = I vvt v T v is a projection onto v. v v 2 = v. v T v fasshauer@iit.edu MATH 532 97

Unitary and Orthogonal Matrices Elementary Reflections Definition Let v( 0) R n. Then R = I 2 vvt v T v is called the elementary (or Householder) reflector about v. Remark For u R n with u 2 = 1 we have R = I 2uu T. fasshauer@iit.edu MATH 532 98

Unitary and Orthogonal Matrices Geometric interpretation Consider u 2 = 1, and note that Qx = (I uu T )x is the orthogonal projection of x onto u as above. Also, Q(Rx) = Q(I 2uu T )x = Q (I 2(I Q)) x = (Q 2Q + 2 }{{} Q 2 )x = Qx, =Q so that Qx is also the orthogonal projection of Rx onto u. Moreover, x Qx = x (I uu T )x = u T x u = u T x and Qx Rx = (Q R)x = ( I uu T (I 2uu T ) ) x = uu T x = u T x. Together, Rx is the reflection of x about u. fasshauer@iit.edu MATH 532 99

Unitary and Orthogonal Matrices Properties of elementary reflections Theorem Let R be an elementary reflector. Then R 1 = R T = R, i.e., R is orthogonal, symmetric, and involutary. Remark However, these properties do not characterize a reflection, i.e., an orthogonal, symmetric and involutary matrix is not necessarily a reflection (see HW). fasshauer@iit.edu MATH 532 100

Unitary and Orthogonal Matrices Proof. R T = (I 2uu T ) T = I 2uu T = R. Also, so that R 1 = R. R 2 = (I 2uu T )(I 2uu T ) = I 4uu T + 4u u }{{} T u u T = I, =1 fasshauer@iit.edu MATH 532 101

Unitary and Orthogonal Matrices Reflection of x onto e 1 If we can construct a matrix R such that Rx = αe 1, then we can use R to zero out entries in (the first column of) a matrix. To this end consider { 1 if x1 real, and note v = x ± µ x 2 e 1, where µ = x 1 x 1 if x 1 complex, v T v = (x ± µ x 2 e 1 ) T (x ± µ x 2 e 1 ) = x T x ± 2µ x 2 e T 1 x + }{{} µ2 x 2 2 =1 = 2(x T x ± µ x e T 1 x) = 2v T x. (9) fasshauer@iit.edu MATH 532 102

Unitary and Orthogonal Matrices Our Householder reflection was defined as so that R = I 2 vvt v T v Rx = x 2 vvt x v T v = x v = µ x }{{ 2 e } 1. =α = x 2v T x v T v }{{ v} (9) =1 Remark These special reflections are used in the Householder variant of the QR factorization. For optimal numerical stability of real matrices one lets µ = sign(x 1 ). fasshauer@iit.edu MATH 532 103

Unitary and Orthogonal Matrices Remark Since R 2 = I (R 1 = R) we have whenever x 2 = 1 Rx = µe 1 = R 2 x = µre 1 x = µr 1. Therefore the matrix U = R (taking µ = 1) is orthogonal (since R is) and contains x as its first column. Thus, this allows us to construct an ON basis for R n that contains x (see example in [Mey00]. fasshauer@iit.edu MATH 532 104

Rotations Unitary and Orthogonal Matrices We give only a brief overview (more details can be found in [Mey00]). We begin in R 2 and look for a matrix representation of the rotation of a vector u into another vector v, counterclockwise by an angle θ: Here u = v = ( u1 u 2 ( v1 v 2 ) ( ) u cos φ = u sin φ ) ( v cos(φ + θ) = v sin(φ + θ) (10) ) (11) fasshauer@iit.edu MATH 532 105

Unitary and Orthogonal Matrices We use the trigonometric identities cos(a + B) = cos A cos B sin A sin B sin(a + B) = sin A cos B + sin B cos A with A = φ, B = θ and v = u to get ( ) v (11) v cos(φ + θ) = v sin(φ + θ) ( ) u (cos φ cos θ sin φ sin θ) = u (sin φ cos θ + sin θ cos φ) ( ) (10) u1 cos θ u = 2 sin θ u 2 cos θ + u 1 sin θ ( ) cos θ sin θ = u = Pu, sin θ cos θ where P is the rotation matrix. fasshauer@iit.edu MATH 532 106

Unitary and Orthogonal Matrices Remark Note that ( ) ( ) P T cos θ sin θ cos θ sin θ P = sin θ cos θ sin θ cos θ ( cos = 2 θ + sin 2 ) θ cos θ sin θ + cos θ sin θ cos θ sin θ + cos θ sin θ sin 2 θ + cos 2 θ = I, so that P is an orthogonal matrix. P T is also a rotation matrix (by an angle θ). fasshauer@iit.edu MATH 532 107

Unitary and Orthogonal Matrices Rotations about a coordinate axis in R 3 are very similar. Such rotations are referred to a plane rotations. For example, rotation about the x-axis (in the yz-plane) is accomplished with 1 0 0 P yz = 0 cos θ sin θ 0 sin θ cos θ Rotation about the y and z-axes is done analogously. fasshauer@iit.edu MATH 532 108

Unitary and Orthogonal Matrices We can use the same ideas for plane rotations in higher dimensions. Definition An orthogonal matrix of the form 1 P ij =... c s 1... s c 1... i j 1 i j with c 2 + s 2 = 1 is called a plane rotation (or Givens rotation). Note that the orientation is reversed from the earlier discussion. fasshauer@iit.edu MATH 532 109

Unitary and Orthogonal Matrices Usually we set c = x i, s = xi 2 + xj 2 since then for x = ( x 1 x n ) T x 1. cx i + sx j P ij x =. = sx i + cx j.. x n x j xi 2 + xj 2 x 1. x 2 i +x 2 j x 2 +x i j 2.. 0. x n This shows that P ij zeros the j th component of x. fasshauer@iit.edu MATH 532 110

Unitary and Orthogonal Matrices Note that xi 2 +xj 2 xi 2 +xj 2 = xi 2 + xj 2 so that repeatedly applying Givens rotations P ij with the same i, but different values of j will zero out all but the i th component of x, and that component will become x 2 1 +... + x 2 n = x 2. Therefore, the sequence P = P in P i,i+1 P i,i 1 P i1 of Givens rotations rotates the vector x R n onto e i, i.e., Px = x 2 e i. Moreover, the matrix P is orthogonal. fasshauer@iit.edu MATH 532 111

Unitary and Orthogonal Matrices Remark Givens rotations can be used as an alternative to Householder reflections to construct a QR factorization. Householder reflections are in general more efficient, but for sparse matrices Givens rotations are more efficient because they can be applied more selectively. fasshauer@iit.edu MATH 532 112

Orthogonal Reduction Orthogonal Reduction [0] 1 Vector Norms 2 Matrix Norms Recall the form of LU factorization (Gaussian elimination): 3 Inner Product Spaces 4 Orthogonal Vectors T n 1 T 2 T 1 A = U, where 5 T k are lower triangular and U is upper triangular, i.e., we have a Gram Schmidt Orthogonalization & QR Factorization triangular reduction. 6 Unitary and Orthogonal Matrices For the QR factorization we will use orthogonal Householder reflectors 7 Orthogonal Reduction R k to get R n 1 R 2 R 1 A = T, 8 Complementary Subspaces where T is upper triangular, i.e., we have an orthogonal reduction. 9 Orthogonal Decomposition 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 114

Orthogonal Reduction Recall Householder reflectors so that and µ = 1 for x real. R = I 2 vvt v T v, with v = x ± µ x e 1, Rx = µ x e 1 Now we explain how to use these Householder reflectors to convert an m n matrix A to an upper triangular matrix of the same size, i.e., how to do a full QR factorization. fasshauer@iit.edu MATH 532 115

Orthogonal Reduction Apply Householder reflector to the first column of A: R 1 A 1 = ) (I 2 vvt v T A 1 with v = A 1 ± A 1 e 1 v t 11 0 = A 1 e 1 =. 0 Then, R 1 applied to all of A yields t 11 t 12 t 1n 0 R 1 A =... = 0 ( t11 t T ) 1 0 A 2 fasshauer@iit.edu MATH 532 116

Orthogonal Reduction Next, we apply the same idea to A 2, i.e., we let ( 1 0 T ) R 2 = 0 ˆR 2 Then R 2 R 1 A = ( t11 t T ) 1 = 0 ˆR2 A 2 t 11 t 12 t 1n t 22 t T 2 0 0 A 3 fasshauer@iit.edu MATH 532 117

Orthogonal Reduction We continue the process until we get an upper triangular matrix, i.e., t 11... R n R 2 R }{{ 1 A =. } O t nn whenever m > n =P O or t 11 R m R 2 R }{{ 1 A = } =P O.... t mm whenever n > m Since each R k is orthogonal (unitary for complex A) we have PA = T with P m m orthogonal and T m n upper triangular, i.e., A = QR (Q = P T, R = T) fasshauer@iit.edu MATH 532 118

Orthogonal Reduction Remark This is similar to obtaining the QR factorization via MGS, but now Q is orthogonal (square) and R is rectangular. This gives us the full QR factorization, whereas MGS gave us the reduced QR factorization (with m n Q and n n R). fasshauer@iit.edu MATH 532 119

Orthogonal Reduction Example We use Householder reflections to find the QR factorization (where R has positive diagonal elements) of 1 2 0 A = 0 1 1. 1 0 1 so that R 1 = I 2 v 1v T 1 v T 1 v, with v 1 = A 1 ± A 1 e 1 1 2 R 1 A = A 1 e 1 = 0. 0 Thus we take the ± sign as so that t 11 = 2 > 0. fasshauer@iit.edu MATH 532 120

Orthogonal Reduction Example ((cont.)) To find R 1 A we can either compute R 1 using the formula above and then compute the matrix-matrix product, or more cheaply note that ( ) R 1 x = I 2 v 1v T 1 v T 1 v 1 x = x 2v T 1 x v 1 v T 1 v, 1 so that we can compute v T 1 A j, j = 2, 3, instead of the full R 1. v T 1 A 2 = (1 2) 2 + 0 1 + 1 0 = 2 2 2 v T 1 A 3 = (1 2) 0 + 0 1 + 1 1 = 1 Also 2 v 1 v T 1 v 1 = 1 2 1 2 0 2 1 fasshauer@iit.edu MATH 532 121

Orthogonal Reduction Example ((cont.)) Therefore 2 R 1 A 2 = 1 2 2 2 0 2 1 2 0 = }{{ 2 } 1 R 1 A 3 = 2 2 1 2 2 = 2 ( ) 2 1 2 so that 2 2 2 2 R 1 A = 0 1 1, A 2 = 0 2 2 2 ( ) 1 1 2 2 2 fasshauer@iit.edu MATH 532 122

Orthogonal Reduction Example ((cont.)) Next ˆR 2 x = x 2v T 2 x v 2 v T 2 v 2 with v 2 = (A 2 ) 1 (A 2 ) 1 e 1 = v T 2 (A 2) 1 = 3 3, v T 2 (A 2) 2 = 3, 2 v 2 v T 2 v 2 so Using R 2 = 1 = 3 3 ( ) ( ) 3 0 ˆR 2 (A 2 ) 1 =, ˆR2 (A 0 2 ) 2 = 6 2 ( 1 0 T ) 0 ˆR we get 2 2 2 2 R 2 R }{{ 1 A = } =P 0 3 2 0 = T 0 0 6 2 ( ) 1 3 2 ( ) 1 3 2 fasshauer@iit.edu MATH 532 123

Orthogonal Reduction Remark As mentioned earlier, the factor R of the QR factorization is given by the matrix T. The factor Q = P T is not explicitly given in the example. One could also obtain the same answer using Givens rotations (compare [Mey00, Example 5.7.2]). fasshauer@iit.edu MATH 532 124

Orthogonal Reduction Theorem Let A be an n n nonsingular real matrix. Then the factorization A = QR with n n orthogonal matrix Q and n n upper triangular matrix R with positive diagonal entries is unique. Remark In this n n case the reduced and full QR factorizations coincide, i.e., the results obtained via Gram Schmidt, Householder and Givens should be identical. fasshauer@iit.edu MATH 532 125

Orthogonal Reduction Proof Assume we have two QR factorizations A = Q 1 R 1 = Q 2 R 2 Q T 2 Q 1 = R 2 R 1 1 = U. Now, R 2 R 1 1 is upper triangular with positive diagonal (since each factor is) and Q T 2 Q 1 is orthogonal. Therefore U has all of these properties. Since U is upper triangular u 11 0 U 1 =.. 0 Moreover, since U is orthogonal u 11 = 1. fasshauer@iit.edu MATH 532 126

Orthogonal Reduction Proof (cont.) Next, u 12 u 22 U T 1 U 2 = ( 1 0 0 ) 0. 0 = u 12 = 0 since the columns of U are orthogonal, and the fact that U 2 = 1 implies u 22 = 1. Comparing all the other pairs of columns of U shows that U = I, and therefore Q 1 = Q 2 and R 1 = R 2. fasshauer@iit.edu MATH 532 127

Orthogonal Reduction Recommendations (so far) for solution of Ax = b 1 If A is square and nonsingular, then use LU factorization with partial pivoting. This is stable for most practical problems and requires O( n3 3 ) operations. 2 To find a least square solution, use QR factorization: Ax = b QRx = b Rx = Q T b. Usually the reduced QR factorization is all that s needed. fasshauer@iit.edu MATH 532 128

Orthogonal Reduction Even though (for square nonsingular A) the Gram Schmidt, Householder and Givens versions of the QR factorization are equivalent (due to the uniqueness theorem), we have for general A that classical GS is not stable, modified GS is stable for least squares, but unstable for QR (since it has problems maintaining orthogonality), Householder and Givens are stable, both for least squares and QR fasshauer@iit.edu MATH 532 129

Orthogonal Reduction Computational cost (for n n matrices) LU with partial pivoting: O( n3 3 ) Gram Schmidt: O(n 3 ) Householder: O( 2n3 3 ) Givens: O( 4n3 3 ) Householder reflections are often the preferred method since they provide both stability and also decent efficiency. fasshauer@iit.edu MATH 532 130

Complementary Subspaces Complementary Subspaces [0] Definition 1 Vector Norms Let V be a vector space and X, Y V be subspaces. X and Y are 2 Matrix Norms called complementary provided 3 Inner Product Spaces 4 Orthogonal Vectors V = X + Y and X Y = {0}. In this case, V is also called the direct sum of X and Y, and we write 5 Gram Schmidt Orthogonalization & QR Factorization 6 Unitary and Orthogonal Matrices Example 7 Orthogonal Reduction V = X Y. Any two lines through the origin in R 2 are complementary. 8 Complementary Subspaces Any plane through the origin in R 3 is complementary to any line through the origin not contained in the plane. 9 Orthogonal Decomposition Two planes through the origin in R 3 are not complementary since they must intersect in a line. 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 132

Complementary Subspaces Theorem Let V be a vector space, and X, Y V be subspaces with bases B X and B Y. The following are equivalent: 1 V = X Y. 2 For every v V there exist unique x X and y Y such that v = x + y. 3 B X B Y = {} and B X B Y is a basis for V. Proof. See [Mey00]. Definition Suppose V = X Y, i.e., any v V can be uniquely decomposed as v = x + y. Then 1 x is called the projection of v onto X along Y. 2 y is called the projection of v onto Y along X. fasshauer@iit.edu MATH 532 133

Complementary Subspaces Properties of projectors Theorem Let X, Y be complementary subspaces of V. Let P, defined by Pv = x, be the projector onto X along Y. Then 1 P is unique. 2 P 2 = P, i.e., P is idempotent. 3 I P is the complementary projector (onto Y along X ). 4 R(P) = {x : Px = x} = X ( fixed points for P). 5 N(I P) = X = R(P) and R(I P) = N(P) = Y. 6 If V = R n (or C n ), then P = ( X O ) ( X Y ) 1 = ( X Y ) ( ) I O (X ) 1 Y, O O where the columns of X and Y are bases for X and Y. fasshauer@iit.edu MATH 532 134

Complementary Subspaces Proof 1 Assume P 1 v = x = P 2 v for all v V. But then P 1 = P 2. 2 We know Pv = x for every v V so that P 2 v = P(Pv) = Px = x. Together we therefore have P 2 = P. 3 Using the unique decomposition of v we can write v = x + y = Pv + y (I P)v = y, the projection of v onto Y along X. fasshauer@iit.edu MATH 532 135

Complementary Subspaces Proof (cont.) 4 Note that x R(P) if and only if x = Px. This is true since if x = Px then x obviously in R(P). On the other hand, if x R(P) then x = Pv for some v V and so Therefore Px = P 2 v (2) = Pv = x. R(P) = {x : x = Pv, v V} = X = {x : Px = x}. 5 Since N(I P) = {x : (I P)x = 0}, and (I P)x = 0 Px = x we have N(I P) = X = R(P). The claim R(I P) = Y = N(P) is shown similarly. fasshauer@iit.edu MATH 532 136

Complementary Subspaces Proof (cont.) 6 Take B = ( X Y ), where the columns of X and Y form a basis for X and Y, respectively. Then the columns of B form a basis for V and B is nonsingular. From above we have Px = x, where x can be any column of X. Also, Py = 0, where y is any column of Y. So PB = P ( X Y ) = ( X O ) or P = ( X O ) B 1 = ( X Y ) 1. This establishes the first part of (6). The second part follows by noting that ( ) I O B = ( X Y ) ( ) I O = ( X O ). O O O O fasshauer@iit.edu MATH 532 137

Complementary Subspaces We just saw that any projector is idempotent, i.e., P 2 = P. In fact, Theorem A matrix P is a projector if and only if P 2 = P. Proof. One direction is given above. For the other see [Mey00]. Remark This theorem is sometimes used to define projectors. fasshauer@iit.edu MATH 532 138

Complementary Subspaces Angle between subspaces In some applications, e.g., when determining the convergence rates of iterative algorithms, it is useful to know the angle between subspaces. If R, N are complementary then sin θ = 1 P 2 = 1 λ max = 1 σ 1, where P is the projector onto R along N, λ max is the largest eigenvalue of P T P and σ 1 is the largest singular value of P. See [Mey00, Example 5.9.2] for more details. fasshauer@iit.edu MATH 532 139

Complementary Subspaces Remark We will skip [Mey00, Section 5.10] on the range nullspace decomposition. While the range nullspace decomposition is theoretically important, its practical usefulness is limited because computation is very unstable due to lack of orthogonality. This also means we will not discuss nilpotent matrices and later on the Jordan normal form. fasshauer@iit.edu MATH 532 140

Orthogonal Decomposition Definition [0] Let V be an inner product space and M V. The orthogonal 1 Vector Norms complement M of M is 2 Matrix Norms 3 Inner Product Spaces M = {x V : m, x = 0 for all m M}. 4 Orthogonal Vectors 5 Gram Schmidt Orthogonalization & QR Factorization 6 Unitary and Orthogonal Matrices 7 Orthogonal Reduction 8 Complementary Subspaces 9 Orthogonal Decomposition Remark 10 Singular Value Decomposition Even if M is not a subspace of V (i.e., only a subset), M is (see HW). 11 Orthogonal Projections fasshauer@iit.edu MATH 532 142

Orthogonal Decomposition Theorem Let V be an inner product space and M V. If M is a subspace of V, then V = M M. Proof According to the definition of complementary subspaces we need to show 1 M M = {0}, 2 M + M = V. fasshauer@iit.edu MATH 532 143

Orthogonal Decomposition Proof (cont.) 1 Let s assume there exists an x M M, i.e., x M and x M. The definition of M implies x, x = 0. But then the definition of an inner product implies x = 0. This is true for any x M M, so x = 0 is the only such vector. fasshauer@iit.edu MATH 532 144

Orthogonal Decomposition Proof (cont.) 2 We let B M and B M be ON bases for M and M, respectively. Since M M = {0} we know that B M B M is an ON basis for some S V. In fact, S = V since otherwise we could extend B M B M to an ON basis of V (using the extension theorem and GS). However, any vector in the extension must be orthogonal to M, i.e., in M, but this is not possible since the extended basis must be linearly independent. Therefore, the extension set is empty. fasshauer@iit.edu MATH 532 145

Orthogonal Decomposition Theorem Let V be an inner product space with dim(v) = n and M be a subspace of V. Then 1 dim M = n dim M, 2 M = M. Proof For (1) recall our dimension formula from Chapter 4 dim(x + Y) = dim X + dim Y dim(x Y). Here M M = {0}, so that dim(m M ) = 0. Also, since M is a subspace of V we have V = M + M and the dimension formula implies (1). fasshauer@iit.edu MATH 532 146

Orthogonal Decomposition Proof (cont.) 2 Instead of directly establishing equality we first show that M M. Since M M = V any x V can be uniquely decomposed into x = m + n with m M, n M. Now we take x M so that x, n = 0 for all n M, and therefore But 0 = x, n = m + n, n = m, n + n, n. }{{} =0 and therefore x = m is in M. n, n = 0 n = 0, fasshauer@iit.edu MATH 532 147

Orthogonal Decomposition Proof (cont.) Now, recall from Chapter 4 that for subspaces X Y dim X = dim Y = X = Y. We take X = M and Y = M (and know from the work just performed that M is a subspace of M). From (1) we know But then M = M. dim M = n dim M dim M = n dim M = n (n dim M) = dim M. fasshauer@iit.edu MATH 532 148

Orthogonal Decomposition Back to Fundamental Subspaces Theorem Let A be a real m n matrix. Then 1 R(A) = N(A T ), 2 N(A) = R(A T ). Corollary R m = R(A) R(A) = R(A) N(A T ), }{{} R m R n = N(A) N(A) = N(A) R(A T ). }{{} R n fasshauer@iit.edu MATH 532 149

Orthogonal Decomposition Proof (of Theorem) 1 We show that x R(A) implies x N(A T ) and vice versa. x R(A) Ay, x = 0 for any y R n y T A T x = 0 for any y R n y, A T x = 0 for any y R n A T x = 0 x N(A T ) by the definitions of these subspaces and of an inner product. 2 Using (1), we have R(A) (1) = N(A T ) R(A) = N(A T ) A A T R(A T ) = N(A). fasshauer@iit.edu MATH 532 150

Orthogonal Decomposition Starting to think about the SVD The decompositions of R m and R n from the corollary help prepare for the SVD of an m n matrix A. Assume rank(a) = r and let B R(A) = {u 1,..., u r } ON basis for R(A) R m, B N(A T ) = {u r+1,..., u m } ON basis for N(A T ) R m, B R(A T ) = {v 1,..., v r } ON basis for R(A T ) R n, B N(A) = {v r+1,..., v n } ON basis for N(A) R n. By the corollary B R(A) B N(A T ) ON basis for R m, B R(A T ) B N(A) ON basis for R n, and therefore the following are orthogonal matrices U = ( u 1 u 2 u m ) V = ( v 1 v 2 v n ). fasshauer@iit.edu MATH 532 151

Orthogonal Decomposition Consider Note that R = U T AV = (u T i Av j ) m,n i,j=1. Av j = 0, j = r + 1,..., n, u T i A = 0 A T u i = 0, i = r + 1,..., m, so u T 1 Av 1 u T 1 Av r ( ) R =.. O u T r Av 1 u T r Av r = Cr r O. O O O O fasshauer@iit.edu MATH 532 152

Orthogonal Decomposition Thus ( ) R = U T Cr r O AV = O O ( ) A = URV T Cr r O = U V T, O O the URV factorization of A. Remark The matrix C r r is nonsingular since rank(c) = rank(u T AV) = rank(a) = r because multiplication by the orthogonal (and therefore nonsingular) matrices U T and V does not change the rank of A. fasshauer@iit.edu MATH 532 153

Orthogonal Decomposition We have now shown that the ON bases for the fundamental subspaces of A yield the URV factorization. As we show next, the converse is also true, i.e., any URV factorization of A yields a ON bases for the fundamental subspaces of A. However, the URV factorization is not unique. Different ON bases result in different factorizations. fasshauer@iit.edu MATH 532 154

Orthogonal Decomposition Consider A = URV T with ( U, V) orthogonal m m and n n matrices, C O respectively, and R = with C nonsingular. O O We partition ( ) ( ) U1 U U = }{{}}{{} 2 V1 V, V = }{{}}{{} 2 m r m m r n r n n r Then V (and therefore also V T ) is nonsingular and we see that R(A) = R(URV T ) = R(UR) = R (( U 1 C O )) = R(U 1 C }{{} m r ) rank(c)=r so that the columns of U 1 are an ON basis for R(A). = R(U 1 ) (12) fasshauer@iit.edu MATH 532 155

Orthogonal Decomposition Moreover, N(A T ) prev. thm = R(A) (12) = R(U 1 ) = R(U 2 ) since U is orthogonal and R m = R(U 1 ) R(U 2 ). This implies that the columns of U 2 are an ON basis for N(A T ). The other two cases can be argued similarly using N(AB) = N(B) provided rank(a) = n. fasshauer@iit.edu MATH 532 156

Orthogonal Decomposition The main difference between a URV factorization and the SVD is that the SVD will contain a diagonal matrix Σ with r nonzero singular values, while R contains the full r r block C. As a first step in this direction, we can easily obtain a URV factorization of A with a lower triangular matrix C. Idea: use Householder reflections (or Givens rotations) fasshauer@iit.edu MATH 532 157

Orthogonal Decomposition Consider an m n matrix A. We apply an m m orthogonal (Householder reflection) matrix P so that ( ) B A PA =, with r m matrix B, rank(b) = r. O Next, use n n orthogonal Q as follows: ( ) B T QB T T =, with r r upper triangular T, rank(t) = r. O Then and BQ T = ( T T O ) B = ( T T O ) Q ( ) B = O ( ) T T O Q. O O fasshauer@iit.edu MATH 532 158

Orthogonal Decomposition Together, ( ) T T O PA = Q O O ( ) A = P T T T O Q, O O a URV factorization with lower triangular block T T. Remark See HW for an example of this process with numbers. fasshauer@iit.edu MATH 532 159

Singular Value Decomposition Singular Value Decomposition [0] 1 Vector Norms 2 Matrix Norms 3 Inner Product Spaces We know 4 Orthogonal Vectors 5 Gram Schmidt Orthogonalization & QR Factorization ( ) A = URV T C O = U V T, O O where C is upper triangular and U, V are orthogonal. 6 Unitary and Orthogonal Matrices 7 Orthogonal Reduction Now we want to establish that C can even be made diagonal. 8 Complementary Subspaces 9 Orthogonal Decomposition 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 161

Singular Value Decomposition Note that A 2 = C 2 =: σ 1 since multiplication by an orthogonal matrix does not change the 2-norm (see HW). Also, C 2 = max z 2 =1 Cz 2 so that C 2 = Cx 2 for some x, x 2 = 1. In fact (see Sect.5.2), x is such that (C T C λi)x = 0, i.e., x is an eigenvector of C T C so that C 2 = σ 1 = λ = x T C T Cx. (13) fasshauer@iit.edu MATH 532 162

Singular Value Decomposition Since x is a unit vector we can extend it to an orthogonal matrix R x = ( x X ), e.g., using Householder reflectors as discussed at the end of Sect.5.6. Similarly, let y = Cx = Cx. (14) Cx 2 σ 1 Then R y = ( y Y ) is also orthogonal (and Hermitian/symmetric) since it s a Householder reflector. fasshauer@iit.edu MATH 532 163

Singular Value Decomposition Now R T y }{{} =R y CR x = ( y T Y T ) C ( x X ) = ( y T Cx y T ) CX Y T Cx Y T. CX From above σ 2 1 = λ (13) = x T C T Cx (14) = σ 1 y T Cx = y T Cx = σ 1. Also, Y T Cx (14) = Y T (σ 1 y) = 0 since R y is orthogonal, i.e., y is orthogonal to the columns of Y. fasshauer@iit.edu MATH 532 164

Singular Value Decomposition Let Y T CX = C 2 and y T CX = c T so that ( σ1 c R y CR x = T ). 0 C 2 To show that c T = 0 T consider c T = y T CX (14) = ( ) Cx T CX σ 1 From (13) x is an eigenvector of C T C, i.e., = x T C T CX σ 1. (15) C T Cx = λx = σ 2 1 x x T C T C = σ 2 1 x T. Plugging this into (15) yields c T = σ 1 x T X = 0 since R x = ( x X ) is orthogonal. fasshauer@iit.edu MATH 532 165

Singular Value Decomposition Moreover, σ 1 C 2 2 since σ 1 = C 2 HW = Ry CR x 2 = max{σ 1, C 2 2 }. Next, we repeat this process for C 2, i.e., ( σ2 0 T ) S y C 2 S x = with σ 0 C 2 C 3 2. 3 Let Then ( 1 0 T P 2 = 0 S T y ) ( ) R T 1 0 T y, Q 2 = R x. 0 S x σ 1 0 0 T P 2 CQ 2 = 0 σ 2 0 T with σ 1 σ 2 C 3 2. 0 0 C 3 fasshauer@iit.edu MATH 532 166

We continue this until P r 1 CQ r 1 = O Finally, let Together, Singular Value Decomposition σ 1 σ 2... O = D, σ 1 σ 2... σ r. σ r ( ) ( ) Ũ T Pr 1 O = U T, and O I Ṽ = Qr 1 O. O I ( Ũ T D AṼ = O ) O O or without the tildes the singular value decomposition (SVD) of A ( ) D O A = U V T, O O where A is m n, U is m m, D = r r and V = n n. fasshauer@iit.edu MATH 532 167

Singular Value Decomposition We use the following terminology: singular values: σ 1 σ 2... σ r > 0, left singular vectors: columns of U, right singular vectors: columns of V. Remark In Chapter 7 we will see that the columns of U and V are also special eigenvectors of A T A. fasshauer@iit.edu MATH 532 168

Singular Value Decomposition Geometric interpretation of SVD For the following we assume A R n n, n = 2. 1 2 0.5 σ 2 v 2 1 σ 1 u 1 2 1 0 1 2 1 0.5 0.5 1 v 2 1 v 1 0.5 2 This picture is true since 1 A = UDV T AV = UD and σ 1, σ 2 are the lengths of the semi-axes of the ellipse because u 1 = u 2 = 1. Remark See [Mey00] for more details. fasshauer@iit.edu MATH 532 169

Singular Value Decomposition For general n, A transforms the 2-norm unit sphere to an ellipsoid whose semi-axes have lengths Therefore, σ 1 σ 2... σ n. κ 2 (A) = σ 1 σ n is the distortion ratio of the transformation A. Moreover, 1 σ 1 = A 2, σ n = A 1 2 so that κ 2 (A) = A 2 A 1 2 is the 2-norm condition number of A ( R n n ). fasshauer@iit.edu MATH 532 170

Singular Value Decomposition Remark The relations for σ 1 and σ n hold because A 2 = UDV T 2 HW = D 2 = σ 1 A 1 2 = VD 1 U T 2 HW = D 1 2 = 1 σ n Remark We always have κ 2 (A) 1, and κ 2 (A) = 1 if and only if A is a multiple of an orthogonal matrix (typo in [Mey00], see proof on next slide). fasshauer@iit.edu MATH 532 171

Singular Value Decomposition Proof = : Assume A = αq with α > 0, Q orthogonal, i.e., Also A 2 = α Q 2 = α max Qx invariance 2 = α max x 2 = α. x 2 =1 x 2 =1 A T A = α 2 Q T Q = α 2 I = A 1 = 1 α 2 AT and A T 2 = A 2 so that A 1 2 = 1 α and κ 2 (A) = A 2 A 1 2 = α 1 α = 1. fasshauer@iit.edu MATH 532 172

Singular Value Decomposition Proof (cont.) = : Assume κ 2 (A) = σ 1 σ n = 1 so that σ 1 = σ n and therefore D = σ 1 I. Thus and A = UDV T = σ 1 UV T A T A = σ 2 1 (UVT ) T UV T = σ 2 1 VUT UV T = σ 2 1 I. fasshauer@iit.edu MATH 532 173

Singular Value Decomposition Applications of the Condition Number Let x be the answer obtained by solving Ax = b with A R n n. Is a small residual r = b A x a good indicator for the accuracy of x? Since x is the exact answer, and x the computed answer we have the relative error x x. x fasshauer@iit.edu MATH 532 174

Singular Value Decomposition Now r = b A x = Ax A x = A(x x) A x x. To get the relative error we multiply by A 1 b x = 1. Then r A A 1 b x x x r b κ(a) x x. (16) x fasshauer@iit.edu MATH 532 175

Singular Value Decomposition Moreover, using r = b A x = b b, Multiplying by Ax b x x = A 1 (b b) A 1 r. = 1 we have x x x κ(a) r b. (17) Combining (16) and (17) yields 1 r κ(a) b x x x κ(a) r b. Therefore, the relative residual r b is a good indicator of relative error if and only if A is well conditioned, i.e., κ(a) is small (close to 1). fasshauer@iit.edu MATH 532 176

Singular Value Decomposition Applications of the SVD 1 Determination of numerical rank(a) : rank(a) index of smallest singular value greater or equal a desired threshold 2 Low-rank approximation of A: The Eckart Young theorem states that A k = k σ i u i v T i is the best rank k approximation to A in the 2-norm (also the Frobenius norm), i.e., A A k 2 = i=1 min A B 2. rank(b)=k Moreover, Run SVD_movie.m A A k 2 = σ k+1. fasshauer@iit.edu MATH 532 177

Singular Value Decomposition 3 Stable solution of least squares problems: Use Moore Penrose pseudoinverse Definition Let A R m n and be the SVD of A. Then ( ) D O A = U O O V T ( ) A D 1 O = V O O is called the Moore Penrose pseudoinverse of A. U T Remark Note that A R n m and A = r i=1 v i u T i σ i, r = rank(a). fasshauer@iit.edu MATH 532 178

Singular Value Decomposition We now show that the least squares solution of Ax = b is given by x = A b. fasshauer@iit.edu MATH 532 179

Singular Value Decomposition Start with normal equations and use ( ) D O A = U V T = O O ŨDṼT, the reduced SVD of A, i.e., Ũ Rm r, Ṽ Rn r. A T Ax = A T b ṼD ŨT }{{} Ũ x = ṼDŨT b =I Multiplication by D 1ṼT yields ṼD2 Ṽ T x = ṼDŨT b DṼT x = ŨT b. fasshauer@iit.edu MATH 532 180

Singular Value Decomposition Thus implies DṼT x = ŨT b x = ṼD 1 Ũ T b ( ) D 1 O x = V U T b O O x = A b. fasshauer@iit.edu MATH 532 181

Singular Value Decomposition Remark If A is nonsingular then A = A 1 (see HW). If rank(a) < n (i.e., the least squares solution is not unique), then x = A b provides the unique solution with minimum 2-norm (see justification on following slide). fasshauer@iit.edu MATH 532 182

Singular Value Decomposition Minimum norm solution of underdetermined systems Note that the general solution of Ax = b is given by z = A b + n, n N(A). Then z 2 2 = A b + n 2 2 Pythag. thm = A b 2 2 + n 2 2 A b 2 2. The Pythagorean theorem applies since (see HW) so that, using R(A T ) = N(A), A b R(A ) = R(A T ) A b n. fasshauer@iit.edu MATH 532 183

Singular Value Decomposition Remark Explicit use of the pseudoinverse is usually not recommended. Instead we solve Ax = b, A R m n, by 1 A = ŨDṼT (reduced SVD) 2 Ax = b DṼT x = ŨT b, so 1 Solve Dy = ŨT b for y 2 Compute x = Ṽy fasshauer@iit.edu MATH 532 184

Singular Value Decomposition Other Applications Also known as principal component analysis (PCA), (discrete) Karhunen-Loève (KL) transformation, Hotelling transform, or proper orthogonal decomposition (POD) Data compression Noise filtering Regularization of inverse problems Tomography Image deblurring Seismology Information retrieval and data mining (latent semantic analysis) Bioinformatics and computational biology Immunology Molecular dynamics Microarray data analysis fasshauer@iit.edu MATH 532 185

Orthogonal Projections Orthogonal Projections [0] Earlier 1 Vector we Norms discussed orthogonal complementary subspaces of an inner product space V, i.e., 2 Matrix Norms 3 Inner Product Spaces V = M M. 4 Orthogonal Vectors Definition 5 Gram Schmidt Orthogonalization & QR Factorization Consider V = M M so that for every 6 Unitary and Orthogonal Matrices v V there exist unique vectors m M, n M such that 7 Orthogonal Reduction 8 Complementary Subspaces v = m + n. Then 9 Orthogonal m is called Decomposition the orthogonal projection of v onto M. The 10 matrix Singular Value P MDecomposition such that P M v = m is the orthogonal projector onto M along M. 11 Orthogonal Projections fasshauer@iit.edu MATH 532 187

Orthogonal Projections For arbitrary complementary subspaces X, Y we showed earlier that the projector onto X along Y is given by P = ( X O ) ( X Y ) 1 = ( X Y ) ( ) I O (X ) 1 Y, O O where the columns of X and Y are bases for X and Y. fasshauer@iit.edu MATH 532 188

Orthogonal Projections Now we let X = M and Y = M be orthogonal complementary subspaces, where M and N contain the basis vectors of M and M in their columns. Then P = ( M O ) ( M N ) 1. (18) To find ( M N ) 1 we note that M T N = N T M = O and if N is an orthogonal matrix (i.e., contains an ON basis), then ( (M T M) 1 M T ) ( ) (M ) I O N = O I N T (note that M T M is invertible since M is full rank because its columns form a basis of M). fasshauer@iit.edu MATH 532 189

Orthogonal Projections Thus ( M N ) 1 = ( (M T M) 1 M T N T ). (19) Inserting (19) into (18) yields P M = ( M O ) ( (M T M) 1 M T ) N T = M(M T M) 1 M T. Remark Note that P M is unique so that this formula holds for an arbitrary basis of M (collected in M). In particular, if M contains an ON basis for M, then P M = MM T. fasshauer@iit.edu MATH 532 190

Orthogonal Projections Similarly, P M = N(N T N) 1 N T (arbitrary basis for N ) P M = NN T ON basis As before, P M = I P M. Example If M = span{u}, u = 1 then P M = P u = uu T and P u T = I P u = I uu T (cf. elementary orthogonal projectors earlier). fasshauer@iit.edu MATH 532 191

Orthogonal Projections Properties of orthogonal projectors Theorem Let P R n n be a projector, i.e., P 2 = P. Then the matrix P is an orthogonal projector if 1 R(P) N(P), 2 P T = P, 3 P 2 = 1. fasshauer@iit.edu MATH 532 192

Orthogonal Projections Proof 1 Follows directly from the definition. 2 = : Assume P is an orthogonal projector, i.e., P = M(M T M) 1 M T and P T = M (M T M) T M = P. }{{} =(M T M) 1 = : Assume P = P T. Then R(P) = R(P T ) Orth.decomp. = N(P) so that P is an orthogonal projector via (1). fasshauer@iit.edu MATH 532 193

Orthogonal Projections Proof (cont.) 3 For complementary subspaces X, Y we know the angle between X and Y is given by P 2 = 1 [0, sin θ, θ π ]. 2 Assume P is an orthogonal projector, then θ = π 2 so that P 2 = 1. Conversely, if P 2 = 1, then θ = π 2 and X, Y are orthogonal complements, i.e., P is an orthogonal projector. fasshauer@iit.edu MATH 532 194

Orthogonal Projections Why is orthogonal projection so important? Theorem Let V be an inner product space with subspace M, and let b V. Then dist(b, M) = min n M b m 2 = b P M b 2, i.e., P M b is the unique vector in M closest to b. The quantity dist(b, M) is called the (orthogonal) distance from b to M. fasshauer@iit.edu MATH 532 195

Orthogonal Projections Proof Let p = P M b. Then p M and p m M for every m M. Moreover, so that b p = (I P M )b M, (p m) (b p). Then b m 2 2 = b p + p m 2 2 Pythag. = b p 2 2 + p m 2 2 b p 2 2. Therefore min m M b m 2 = b p 2. fasshauer@iit.edu MATH 532 196

Orthogonal Projections Proof (cont.) Uniqueness: Assume there exists a q M such that Then b q 2 = b p 2. (20) b q 2 2 = b }{{ p + p q }}{{} M M 2 2 Pythag. = b p 2 2 + p q 2 2. But then (20) implies that p q 2 2 = 0 and therefore p = q. fasshauer@iit.edu MATH 532 197

Orthogonal Projections Least squares approximation revisited Now we give a modern derivation of the normal equations (without calculus), and note that much of this remains true for best L 2 approximation. Goal of least squares: For A R m n, find min m x R n i=1 ((Ax) i b i ) 2 min x R n Ax b 2. Now Ax R(A), so the least squares error is dist(b, R(A)) = min Ax R(A) b Ax 2 = b P R(A) b 2 with P R(A) the orthogonal projector onto R(A). fasshauer@iit.edu MATH 532 198

Orthogonal Projections fasshauer@iit.edu MATH 532 199