MATH 532: Linear Algebra Chapter 5: Norms, Inner Products and Orthogonality Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Spring 2015 fasshauer@iit.edu MATH 532 1
Outline 1 Vector Norms 2 Matrix Norms 3 Inner Product Spaces 4 Orthogonal Vectors 5 Gram Schmidt Orthogonalization & QR Factorization 6 Unitary and Orthogonal Matrices 7 Orthogonal Reduction 8 Complementary Subspaces 9 Orthogonal Decomposition 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 2
Vector Norms [0] Vector Norms 1 Vector Norms 2 Matrix Norms Definition 3 Inner Product Spaces Let x, y R n (C n ). Then 4 Orthogonal Vectors x T y = 5 Gram Schmidt Orthogonalization & QR Factorization 6 Unitary and Orthogonal Matrices 7 Orthogonal Reduction 8 Complementary Subspaces x y = n x i y i i=1 n x i y i i=1 R C is called the standard inner product for R n (C n ). 9 Orthogonal Decomposition 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 4
Vector Norms Definition Let V be a vector space. A function : V R 0 is called a norm provided for any x, y V and α R 1 x 0 and x = 0 if and only if x = 0, 2 αx = α x, 3 x + y x + y. Remark The inequality in (3) is known as the triangle inequality. fasshauer@iit.edu MATH 532 5
Vector Norms Remark Any inner product, induces a norm via (more later) x = x, x. We will show that the standard inner product induces the Euclidean norm (cf. length of a vector). Remark Inner products let us define angles via cos θ = x T y x y. In particular, x, y are orthogonal if and only if x T y = 0. fasshauer@iit.edu MATH 532 6
Vector Norms Example Let x R n and consider the Euclidean norm x 2 = x T x ( n ) 1/2 = xi 2. We show that 2 is a norm. We do this for the real case, but the complex case goes analogously. 1 Clearly, x 2 0. Also, i=1 x 2 = 0 x 2 2 = 0 n xi 2 = 0 x i = 0, i = 1,..., n, i=1 x = 0. fasshauer@iit.edu MATH 532 7
Vector Norms Example (cont.) 2 We have ( n ) 1/2 ( n ) 1/2 αx 2 = (αx i ) 2 = α xi 2 = α x 2. i=1 i=1 3 To establish (3) we need Lemma Let x, y R n. Then x T y x 2 y 2. (Cauchy Schwarz Bunyakovsky) Moreover, equality holds if and only if y = αx with α = x T y x 2. 2 fasshauer@iit.edu MATH 532 8
Vector Norms Motivation for Proof of Cauchy Schwarz Bunyakovsky As already alluded to above, the angle θ between two vectors a and b is related to the inner product by cos θ = at b a b. Using trigonometry as in the figure, the projection of a onto b is then a cos θ b b = a at b b a b b = at b b 2 b. Now, we let y = a and x = b, so that the projection of y onto x is given by αx, where α = x T y x 2. fasshauer@iit.edu MATH 532 9
Vector Norms Proof of Cauchy Schwarz Bunyakovsky We know that y αx 2 2 0 since it s (the square of) a norm. Therefore, 0 y αx 2 2 = (y αx)t (y αx) = y T y 2αx T y + α 2 x T x = y T y 2 x ( T y x x 2 x T T y ) 2 y + x 4 x}{{} T x = x 2 2 ( x = y 2 T 2 y ) 2 x 2. 2 This implies ( x T y) 2 x 2 2 y 2 2, and the Cauchy Schwarz Bunyakovsky inequality follows by taking square roots. fasshauer@iit.edu MATH 532 10
Vector Norms Proof (cont.) Now we look at the equality claim. = : Let s assume that x T y = x 2 y 2. But then the first part of the proof shows that y αx 2 = 0 so that y = αx. = : Let s assume y = αx. Then x T y = x T (αx) = α x 2 2 so that we have equality. x 2 y 2 = x 2 αx 2 = α x 2 2, fasshauer@iit.edu MATH 532 11
Vector Norms Example (cont.) 3 Now we can show that 2 satisfies the triangle inequality: x + y 2 2 = (x + y)t (x + y) = x T x }{{} = x 2 2 +x T y + y T x }{{} =x T y = x 2 2 + 2x T y + y 2 2 x 2 2 + 2 x T y + y 2 2 + y T y }{{} = y 2 2 CSB x 2 2 + 2 x 2 y 2 + y 2 2 = ( x 2 + y 2 ) 2. Now we just need to take square roots to have the triangle inequality. fasshauer@iit.edu MATH 532 12
Vector Norms Lemma Let x, y R n. Then we have the backward triangle inequality x y x y. Proof We write x = x y + y tri.ineq. x y + y. But this implies x y x y. fasshauer@iit.edu MATH 532 13
Vector Norms Proof (cont.) Switch the roles of x and y to get y x y x ( x y ) x y. Together with the previous inequality we have x y x y. fasshauer@iit.edu MATH 532 14
Other common norms Vector Norms l 1 -norm (or taxi-cab norm, Manhattan norm): n x 1 = x i i=1 l -norm (or maximum norm, Chebyshev norm): x = max 1 i n x i l p -norm: ( n ) 1/p x p = x i p i=1 Remark In the homework you will use Hölder s and Minkowski s inequalities to show that the p-norm is a norm. fasshauer@iit.edu MATH 532 15
Vector Norms Remark We now show that x = lim p x p. Let s use tildes to mark all components of x that are maximal, i.e.. x 1 = x 2 =... = x k = max 1 i n x i. The remaining components are then x k+1,..., x n. This implies that x i x 1 < 1, for i = k + 1,..., n. fasshauer@iit.edu MATH 532 16
Vector Norms Remark (cont.) Now ( n ) 1/p x p = x i p i=1 = x 1 k + x k+1 x }{{ 1 } <1 p +... + x n p x }{{} 1 <1 Since the terms inside the parentheses except for k go to 0 for p, ( ) 1/p 1 for p. And so x p x 1 = max 1 i n x i = x. 1/p. fasshauer@iit.edu MATH 532 17
Vector Norms Figure: Unit balls in R 2 for the l 1, l 2 and l norms. Note that B 1 B 2 B since, e.g., ( ) 2 2 2 = ( ) 2 2, 2 1 2 = 2 + 1 2 = 1, 2 1 2 2 ( ) 2 so that 1 2 2 2 In fact, we have in general (similar to HW) ( 2 2 2 2 x 1 x 2 x, for any x R n. ( ) 2 2 2 2 = 2, 2 ) ( ) 2. 2 fasshauer@iit.edu MATH 532 18 2 2 2
Vector Norms Norm equivalence Definition Two norms and on a vector space V are called equivalent if there exist constants α, β such that Example α x x β for all x( 0) V. 1 and 2 are equivalent since from above x 1 x 2 and also x 1 n x 2 (see HW) so that Remark α = 1 x 1 x 2 n = β. In fact, all norms on finite-dimensional vector spaces are equivalent. fasshauer@iit.edu MATH 532 19
Matrix Norms [0] Matrix norms are special norms they will satisfy one additional property. 1 Vector Norms This property should help us measure AB for two matrices A, B of 2 Matrix Norms appropriate sizes. We 3 look Inner Product at the Spaces simplest matrix norm, the Frobenius norm, defined for A R m,n : 4 Orthogonal Vectors 1/2 ( m n m ) 1/2 5 Gram Schmidt Orthogonalization & QR Factorization A F = a ij 2 = A i 2 2 6 Unitary and Orthogonal Matricesi=1 j=1 7 Orthogonal Reduction 8 Complementary Subspaces 9 Orthogonal Decomposition n = A j 2 2 j=1 1/2 = i=1 trace(a T A), i.e., the Frobenius norm is just a 2-norm for the vector that contains all elements 10 Singular Value of the Decomposition matrix. 11 Orthogonal Projections fasshauer@iit.edu MATH 532 21
Now Matrix Norms m Ax 2 2 = A i x 2 CSB i=1 m A i 2 2 x 2 2 i=1 } {{ } = A 2 F so that Ax 2 A F x 2. We can generalize this to matrices, i.e., we have AB F A F B F, which motivates us to require this submultiplicativity for any matrix norm. fasshauer@iit.edu MATH 532 22
Matrix Norms Definition A matrix norm is a function from the set of all real (or complex) matrices of finite size into R 0 that satisfies 1 A 0 and A = 0 if and only if A = O (a matrix of all zeros). 2 αa = α A for all α R. 3 A + B A + B (requires A, B to be of same size). 4 AB A B (requires A, B to have appropriate sizes). Remark This definition is usually too general. In addition to the Frobenius norm, most useful matrix norms are induced by a vector norm. fasshauer@iit.edu MATH 532 23
Matrix Norms Induced matrix norms Theorem Let (m) and (n) be vector norms on R m and R n, respectively, and let A be an m n matrix. Then A = max x (n) =1 Ax (m) is a matrix norm called the induced matrix norm. Remark Here the vector norm could be any vector norm. In particular, any p-norm. For example, we could have A 2 = max x 2,(n) =1 Ax 2,(m). To keep notation simple we often drop indices. fasshauer@iit.edu MATH 532 24
Matrix Norms Proof 1 A 0 is obvious since this holds for the vector norm. It remains to show that A = 0 if and only if A = O. Assume A = O, then A = max Ax x =1 }{{} = 0. So now consider A O. We need to show that A > 0. There must exist a column of A that is not 0. We call this column A k and take x = e k. Then A = max Ax ek =1 Ae k = A k > 0 x =1 since A k 0. =0 fasshauer@iit.edu MATH 532 25
Matrix Norms Proof (cont.) 2 Using the corresponding property for the vector norm we have αa = max αax = α max Ax = α A. 3 Also straightforward (based on the triangle inequality for the vector norm) A + B = max (A + B)x = max Ax + Bx max ( Ax + Bx ) = max Ax + max Bx = A + B. fasshauer@iit.edu MATH 532 26
Matrix Norms Proof (cont.) 4 First note that and so Therefore Ax max Ax = max x =1 x 0 x Ax A = max Ax = max x =1 x 0 x Ax x. But then we also have AB A B since Ax A x. (1) AB = max ABx = ABy (for some y with y = 1) x =1 (1) A By (1) A B y. }{{} =1 fasshauer@iit.edu MATH 532 27
Matrix Norms Remark One can show (see HW) that if A is invertible min Ax = 1 x =1 A 1. The induced matrix norm can be interpreted geometrically: A : the most a vector on the unit sphere can be stretched when transformed by A. 1 A 1 : the most a vector on the unit sphere can be shrunk when transformed by A. fasshauer@iit.edu MATH 532 28
Matrix 2-norm Matrix Norms Theorem Let A be an m n matrix. Then 1 A 2 = max x =1 Ax 2 = λ max. 2 A 1 2 = 1 = 1. min x =1 Ax 2 λmin where λ max and λ min are the largest and smallest eigenvalues of A T A, respectively. Remark We also have λmax = σ 1, the largest singular value of A, λmin = σ n, the smallest singular value of A. fasshauer@iit.edu MATH 532 29
Matrix Norms Proof We will show only (1), the largest singular value ((2) goes similarly). The idea is to solve a constrained optimization problem (as in calculus), i.e., maximize f (x) = Ax 2 2 = (Ax)T Ax subject to g(x) = x 2 2 = x T x = 1. We do this by introducing a Lagrange multiplier λ and define h(x, λ) = f (x) λg(x) = x T A T Ax λx T x. fasshauer@iit.edu MATH 532 30
Matrix Norms Proof (cont.) Necessary and sufficient (since quadratic) condition for maximum: h x i = 0, i = 1,..., n, g(x) = 1 ( ) x T A T Ax λx T x = x T A T Ax + x T A T A x λ x T x λx T x x i x i x i x i x i = 2e T i A T Ax 2λe T i x ) = 2 ((A T Ax) i (λx) i, i = 1,..., n. Together this yields A T Ax λx = 0 ( ) A T A λi x = 0, so that λ must be an eigenvalue of A T A (since g(x) = x T x = 1 ensures x 0). fasshauer@iit.edu MATH 532 31
Matrix Norms Proof (cont.) In fact, as we now show, λ is the maximal eigenvalue. First, A T Ax = λx = x T A T Ax = λx T x = λ so that And then Ax 2 = x T A T Ax = λ. A 2 = max x 2 =1 Ax 2 = max x 2 2 =1 Ax 2 = max λ = λ max. fasshauer@iit.edu MATH 532 32
Matrix Norms Special properties of the 2-norm 1 A 2 = max x 2 =1 max y 2 =1 y T Ax 2 A 2 = A T 2 3 A T A 2 = A 2 2 = AAT 2 ( ) 4 A O = max { A O B 2, B 2 } 5 U T AV 2 = A 2 provided UU T = I and V T V = I (orthogonal Remark matrices). The proof is a HW problem. fasshauer@iit.edu MATH 532 33
Matrix Norms Matrix 1-norm and -norm Theorem Let A be an m n matrix. Then we have 1 the column sum norm 2 and the row sum norm Remark A 1 = max x 1 =1 Ax 1 = max j=1,...,n A = max Ax = x =1 max i=1,...,m m a ij, i=1 n a ij. We know these are norms, so what we need to do is verify that the formulas hold. We will show (1). fasshauer@iit.edu MATH 532 34 j=1
Matrix Norms Proof First we look at Ax 1. Ax 1 = m (Ax) i = i=1 reg. = j=1 m i=1 j=1 m A i x = i=1 n a ij x j [ ] [ n m x j a ij i=1 m i=1 max j=1,...,n n a ij x j j=1 ] m n a ij x j. Since we actually need to look at Ax 1 for x 1 = 1 we note that x 1 = n j=1 x j and therefore have Ax 1 max j=1,...,n m a ij. fasshauer@iit.edu MATH 532 35 i=1 i=1 j=1
Matrix Norms Proof (cont.) We even have equality since for x = e k, where k is the index such that A k has maximum column sum, we get Ax 1 = Ae k 1 = A k 1 = = max j=1...,n m a ik i=1 m a ik i=1 due to our choice of k. Since e k 1 = 1 we indeed have the desired formula. fasshauer@iit.edu MATH 532 36
Inner Product Spaces [0] Definition A1 general Vector Norms inner product in a real (complex) vector space V is a symmetric (Hermitian) bilinear form, : V V R (C), i.e., 2 Matrix Norms 1 x, x R 0 with x, x = 0 if and only if x = 0. 3 Inner Product Spaces 2 x, αy = α x, y for all scalars α. 4 Orthogonal Vectors 3 x, y + z = x, y + x, z. 5 Gram Schmidt Orthogonalization & QR Factorization 4 x, y = y, x (or x, y = y, x if complex). 6 Unitary and Orthogonal Matrices Remark 7 Orthogonal Reduction The following two properties (providing bilinearity) are implied (see HW) 8 Complementary Subspaces 9 Orthogonal Decomposition 10 Singular Value Decomposition αx, y = α x, y x + y, z = x, z + y, z. 11 Orthogonal Projections fasshauer@iit.edu MATH 532 38
Inner Product Spaces As before, any inner product induces a norm via =,. One can show (analogous to the Euclidean case) that is a norm. In particular, we have a general Cauchy Schwarz Bunyakovsky inequality x, y x y. fasshauer@iit.edu MATH 532 39
Inner Product Spaces Example 1 x, y = x T y (or x y), the standard inner product for R n (C n ). 2 For nonsingular matrices A we get the A-inner product on R n, i.e., x, y = x T A T Ay with x A = x, x = x T A T Ax = Ax 2. 3 If V = R m n (or C m n ) then we get the standard inner product for matrices, i.e., A, B = trace(a T B) (or trace(a B)) with A = A, A = trace(a T A) = A F. fasshauer@iit.edu MATH 532 40
Inner Product Spaces Remark In the infinite-dimensional setting we have, e.g., for f, g continuous functions on (a, b) b f, g = f (t)g(t)dt a with f = ( 1/2 b f, f = (f (t)) dt) 2. a fasshauer@iit.edu MATH 532 41
Parallelogram identity Inner Product Spaces In any inner product space the so-called parallelogram identity holds, i.e., ( x + y 2 + x y 2 = 2 x 2 + y 2). (2) This is true since x + y 2 + x y 2 = x + y, x + y + x y, x y = x, x + x, y + y, x + y, y + x, x x, y y, x + y, y ( = 2 x, x + 2 y, y = 2 x 2 + y 2). fasshauer@iit.edu MATH 532 42
Inner Product Spaces Polarization identity The following theorem shows that we not only get a norm from an inner product (i.e., every Hilbert space is a Banach space), but if the parallelogram identity holds then we can get an inner product from a norm (i.e., a Banach space becomes a Hilbert space). Theorem Let V be a real vector space with norm. If the parallelogram identity (2) holds then x, y = 1 4 is an inner product on V. ( x + y 2 x y 2) (3) fasshauer@iit.edu MATH 532 43
Inner Product Spaces Proof We need to show that all four properties of a general inner product hold. 1 Nonnegativity: x, x = 1 4 ( x + x 2 x x 2) = 1 4 2x 2 = x 2 0. Moreover, x, x > 0 if and only if x = 0 since x, x = x 2. 4 Symmetry: x, y = y, x is clear since x y = y x. fasshauer@iit.edu MATH 532 44
Inner Product Spaces Proof (cont.) 3 Additivity: The parallelogram identity implies and x + y 2 + x + z 2 = 1 2 x y 2 + x z 2 = 1 2 Subtracting (5) from (4) we get ( x + y + x + z 2 + y z 2). (4) ( x y + x z 2 + z y 2). (5) x + y 2 x y 2 + x + z 2 x z 2 = 1 ( 2x + y + z 2 2x y z 2). 2 (6) fasshauer@iit.edu MATH 532 45
Inner Product Spaces Proof (cont.) The specific form of the polarized inner product implies x, y + x, z = 1 ( x + y 2 x y 2 + x + z 2 x z 2) 4 (6) = 1 ( 2x + y + z 2 2x y z 2) 8 ( = 1 2 x + y + z 2 2 x y + z ) 2 2 polarization Setting z = 0 in (7) yields since x, z = 0. = 2 x, y + z. (7) 2 x, y = 2 x, y 2 (8) fasshauer@iit.edu MATH 532 46
Inner Product Spaces Proof (cont.) To summarize, we have x, y + x, z = 2 x, y + z. (7) 2 and x, y = 2 x, y. (8) 2 Since (8) is true for any y V we can, in particular, set y = y + z so that we have x, y + z = 2 x, y + z 2. This, however, is the right-hand side of (7) so that we end up with as desired. x, y + z = x, y + x, z, fasshauer@iit.edu MATH 532 47
Inner Product Spaces Proof (cont.) 2 Scalar multiplication: To show x, αy = α x, y for integer α we can just repeatedly apply the additivity property just proved. From this we can get the property for rational α as follows. We let α = β γ with integer β, γ 0 so that βγ x, y = γx, βy = γ 2 x, β γ y. Dividing by γ 2 we get β γ x, y = x, β γ y. fasshauer@iit.edu MATH 532 48
Inner Product Spaces Proof (cont.) Finally, for real α we use the continuity of the norm function (see HW) which implies that our inner product, also is continuous. Now we take a sequence {α n } of rational numbers such that α n α for n and have by continuity x, α n y x, αy as n. fasshauer@iit.edu MATH 532 49
Inner Product Spaces Theorem The only vector p-norm induced by an inner product is the 2-norm. Remark Since many problems are more easily dealt with in inner product spaces (since we then have lengths and angles, see next section) the 2-norm has a clear advantage over other p-norms. fasshauer@iit.edu MATH 532 50
Inner Product Spaces Proof We know that the 2-norm does induce an inner product, i.e., x 2 = x T x. Therefore we need to show that it doesn t work for p 2. We do this by showing that the parallelogram identity x + y 2 + x y 2 = 2 ( x 2 + y 2) fails for p 2. We will do this for 1 p <. You will work out the case p = in a HW problem. fasshauer@iit.edu MATH 532 51
Inner Product Spaces Proof (cont.) All we need is a counterexample, so we take x = e 1 and y = e 2 so that ( n ) 2/p x + y 2 p = e 1 + e 2 2 p = [e 1 + e 2 ] i p = 2 2/p i=1 and, similarly x y 2 p = e 1 e 2 2 p = 2 2/p. Together, the left-hand side of the parallelogram identity is 2 ( 2 2/p) = 2 2/p+1. fasshauer@iit.edu MATH 532 52
Inner Product Spaces Proof (cont.) For the right-hand side of the parallelogram identity we calculate x 2 p = e 1 2 p = 1 = e 2 2 p = y 2 p, so that the right-hand side comes out to 4. Finally, we have 2 2/p+1 = 4 2 p + 1 = 2 2 p = 1 or p = 2. fasshauer@iit.edu MATH 532 53
Orthogonal Vectors [0] Orthogonal Vectors 1 Vector Norms 2 Matrix Norms We will now work in a general inner product space V with induced norm 3 Inner Product Spaces =,. 4 Orthogonal Vectors 5 Gram Schmidt Orthogonalization & QR Factorization Definition 6 Unitary and Orthogonal Matrices Two vectors x, y V are called orthogonal if 7 Orthogonal Reduction 8 Complementary Subspaces We often use the notation x y. 9 Orthogonal Decomposition x, y = 0. 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 55
Orthogonal Vectors In the HW you will prove the Pythagorean theorem for the 2-norm and standard inner product x T y, i.e., x 2 + y 2 = x y 2 x T y = 0. Moreover, the law of cosines states so that x y 2 = x 2 + y 2 2 x y cos θ, cos θ = x 2 + y 2 x y 2 2 x y Pythagoras = 2x T y 2 x y. This motivates our general definition of angles: Definition Let x, y V. The angle between x and y is defined via cos θ = x, y x y, θ [0, π]. fasshauer@iit.edu MATH 532 56
Orthogonal Vectors Orthonormal sets Definition A set {u 1, u 2,..., u n } V is called orthonormal if u i, u j = δ ij (Kronecker delta). Theorem Every orthonormal set is linearly independent. Corollary Every orthonormal set of n vectors from an n-dimensional vector space V is an orthonormal basis for V. fasshauer@iit.edu MATH 532 57
Orthogonal Vectors Proof (of the theorem) We want to show linear independence, i.e., that n α j u j = 0 = α j = 0, j = 1,..., n. j=1 To see this is true we take the inner product with u i : u i, n j=1 n α j u j = u i, 0 j=1 α j u i, u j }{{} = 0 α i = 0. =δ ij Since i was arbitrary this holds for all i = 1,..., n, and we have linear independence. fasshauer@iit.edu MATH 532 58
Orthogonal Vectors Example The standard orthonormal basis of R n is given by {e 1, e 2,..., e n }. Using this basis we can express any x R n as x = x 1 e 1 + x 2 e 2 +... + x n e n, we get a coordinate expansion of x. fasshauer@iit.edu MATH 532 59
Orthogonal Vectors In fact, any other orthonormal basis provides just as simple a representation of x; Consider the orthonormal basis B = {u 1, u 2,..., u n } and assume x = n α j u j j=1 for some appropriate scalars α j. To find these expansion coefficients α j we take the inner product with u i, i.e., u i, x = n j=1 α j u i, u j }{{} =δ ij = α i. fasshauer@iit.edu MATH 532 60
Orthogonal Vectors We therefore have proved Theorem Let {u 1, u 2,..., u n } be an orthonormal basis for an inner product space V. Then any x V can be written as x = n x, u i u i. j=1 This is a (finite) Fourier expansion with Fourier coefficients x, u i. fasshauer@iit.edu MATH 532 61
Orthogonal Vectors Remark The classical (infinite-dimensional) Fourier series for continuous functions on ( π, π) uses the orthogonal (but not yet orthonormal) basis {1, sin t, cos t, sin 2t, cos 2t,..., } and the inner product f, g = π π f (t)g(t)dt. fasshauer@iit.edu MATH 532 62
Orthogonal Vectors Example Consider the basis 1 0 1 B = {u 1, u 2, u 3 } = 0, 1, 0. 1 0 1 It is clear by inspection that B is an orthogonal subset of R 3, i.e., using the Euclidean inner product, we have u T i u j = 0, i, j = 1, 2, 3, i j. We can obtain an orthonormal basis by normalizing the vectors, i.e., by computing v i = u i u i 2, i = 1, 2, 3. This yields v 1 = 1 1 0 0, v 2 = 1, v 3 = 1 1 0. 2 1 0 2 1 fasshauer@iit.edu MATH 532 63
Orthogonal Vectors Example (cont.) The Fourier expansion of x = ( 1 2 3 ) T is given by x = 3 i=1 (x T v i ) v i = 4 1 0 1 2 0 + 2 1 2 1 1 2 0 2 1 0 2 1 2 0 1 = 0 + 2 + 0. 2 0 1 fasshauer@iit.edu MATH 532 64
Gram Schmidt Orthogonalization & QR Factorization [0] 1 Vector Norms 2 Matrix Norms We 3 want to convert an arbitrary basis {x 1, x 2,..., x n } of V to an Inner Product Spaces orthonormal basis {u 1, u 2,..., u n }. 4 Orthogonal Vectors Idea: construct u 1, u 2,..., u n successively so that {u 1, u 2,..., u k } is an ON basis for span{x 1, x 2,..., x k }, k = 1,..., n. 5 Gram Schmidt Orthogonalization & QR Factorization 6 Unitary and Orthogonal Matrices 7 Orthogonal Reduction 8 Complementary Subspaces 9 Orthogonal Decomposition 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 66
Gram Schmidt Orthogonalization & QR Factorization Construction k = 1: u 1 = x 1 x 1. k = 2: Consider the projection of x 2 onto u 1, i.e., u 1, x 2 u 1. Then and v 2 = x 2 u 1, x 2 u 1 u 2 = v 2 v 2. fasshauer@iit.edu MATH 532 67
Gram Schmidt Orthogonalization & QR Factorization In general, consider {u 1,..., u k } as a given ON basis for span{x 1,..., x k }. Use the Fourier expansion to express x k+1 with respect to {u 1,..., u k+1 }: k+1 x k+1 = u i, x k+1 u i x k+1 = i=1 k u i, x k+1 u i + u k+1, x k+1 u k+1 i=1 u k+1 = x k+1 k i=1 u i, x k+1 u i u k+1, x k+1 This vector, however is not yet normalized. = v k+1 u k+1, x k+1 fasshauer@iit.edu MATH 532 68
Gram Schmidt Orthogonalization & QR Factorization We now want u k+1 = 1, i.e., v k+1 u k+1, x k+1, v k+1 u k+1, x k+1 = 1 u k+1, x k+1 v k+1 = 1 = v k+1 = x k+1 Therefore k u i, x k+1 u i = u k+1, x k+1. i=1 u k+1, x k+1 = ± x k+1 k u i, x k+1 u i. Since the factor ±1 does not change the span, nor orthogonality, nor normalization we can pick the positive sign. i=1 fasshauer@iit.edu MATH 532 69
Gram Schmidt Orthogonalization & QR Factorization Gram Schmidt algorithm Summarizing, we have u 1 = x 1 x 1, k 1 v k = x k u i, x k u i, k = 2,..., n, u k = v k v k. i=1 fasshauer@iit.edu MATH 532 70
Gram Schmidt Orthogonalization & QR Factorization Using matrix notation to describe Gram Schmidt We will assume V R m (but this also works in the complex case). Let 0 U 1 =. R m 0 and for k = 2, 3,..., n let U k = ( u 1 u 2 u k 1 ) R m k 1. fasshauer@iit.edu MATH 532 71
Gram Schmidt Orthogonalization & QR Factorization Then and u T 1 x k U T k x u T 2 k = x k. u T k 1 x k u T 1 U k U T k x k = ( ) x k u T 2 u 1 u 2 u k 1 x k. u T k 1 x k k 1 k 1 = u i (u T i x k ) = (u T i x k )u i. i=1 i=1 fasshauer@iit.edu MATH 532 72
Gram Schmidt Orthogonalization & QR Factorization Now, Gram Schmidt says k 1 v k = x k (u T i x k )u i = x k U k U T k x k = i=1 ( I U k U T k ) x k, k = 1, 2,..., n, where the case k = 1 is also covered by the special definition of U 1. Remark U k U T k is a projection matrix, and I U ku T k projection. We will cover these later. is a complementary fasshauer@iit.edu MATH 532 73
Gram Schmidt Orthogonalization & QR Factorization QR Factorization (via Gram Schmidt) Consider an m n matrix A with rank(a) = n. We want to convert the set of columns of A, {a 1, a 2,..., a n } to an ON basis {q 1, q 2,..., q n } of R(A). From our discussion of Gram Schmidt we know q 1 = a 1 a 1, k 1 v k = a k q i, a k q i, k = 2,..., n, q k = v k v k. i=1 fasshauer@iit.edu MATH 532 74
Gram Schmidt Orthogonalization & QR Factorization We now rewrite as follows: a 1 = a 1 q 1 a k = q 1, a 2 q 1 +... + q k 1, a k q k 1 + v k q k, k = 2,..., n. We also introduce the new notation Then r 11 = a 1, r kk = v k, k = 2,..., n. A = ( ) a 1 a 2 a n r 11 q 1, a 2 q 1, a n = ( ) r 22 q 2, a n q 1 q 2 q n }{{}.... =Q } O {{ r nn } =R and we have the reduced QR factorization of A. fasshauer@iit.edu MATH 532 75
Gram Schmidt Orthogonalization & QR Factorization Remark The matrix Q is m n with orthonormal columns The matrix R is n n upper triangular with positive diagonal entries. The reduced QR factorization is unique (see HW). fasshauer@iit.edu MATH 532 76
Gram Schmidt Orthogonalization & QR Factorization Example 1 2 0 Find the QR factorization of the matrix A = 0 1 1. 1 0 1 q 1 = a 1 a 1 = 1 ( v 2 = a 2 1 0, r 11 = a 1 = 2 2 1 ) q T 1 a 2 2 1 = 1 2 0 = 0 2 1 = q 2 = v 2 v 2 = 1 1 1 3 1 q 1, q T 1 a 2 = 2 = 2 = r 12 2 1 1 1, v 2 = 3 = r 22 fasshauer@iit.edu MATH 532 77
Gram Schmidt Orthogonalization & QR Factorization Example (cont.) ( ) ( ) v 3 = a 3 q T 1 a 3 q 1 q T 2 a 3 q 2 with q T 1 a 3 = 1 2 = r 13, q T 2 a 3 = 0 = r 23 Thus 0 v 3 = 1 1 1 0 0 = 1 1 2, v 3 = 1 2 2 2 1 1 6 2 = r 33 So q 3 = v 3 v 3 = 1 1 2 6 1 fasshauer@iit.edu MATH 532 78
Gram Schmidt Orthogonalization & QR Factorization Example (cont.) Together we have Q = 1 3 1 1 2 6 1 0 3 6 2 1 1 6 1 2 3, R = 2 2 2 1 0 3 0 0 0 6 fasshauer@iit.edu MATH 532 79
Gram Schmidt Orthogonalization & QR Factorization Solving linear systems with the QR factorization Recall the use of the LU factorization to solve Ax = b. Now, A = QR implies Ax = b QRx = b. In the special case of a nonsingular n n matrix A the matrix Q is also n n with ON columns so that Q 1 = Q T (since Q T Q = I) and QRx = b Rx = Q T b. fasshauer@iit.edu MATH 532 80
Gram Schmidt Orthogonalization & QR Factorization Therefore we solve Ax = b by the following steps: 1 Compute A = QR. 2 Compute y = Q T b. 3 Solve the upper triangular system Rx = y. Remark This procedure is comparable to the three-step LU solution procedure. fasshauer@iit.edu MATH 532 81
Gram Schmidt Orthogonalization & QR Factorization The real advantage of the QR factorization lies in the solution of least squares problems. Consider Ax = b with A R m n and rank(a) = n (so that a unique least squares solution exists). We know that the least squares solution is given by the solution of the normal equations A T Ax = A T b. Using the QR factorization of A this becomes (QR) T QRx = (QR) T b R T Q }{{ T Q } Rx = R T Q T b =I R T Rx = R T Q T b. Now R is upper triangular with positive diagonal and therefore invertible. Therefore solving the normal equations corresponds to solving (cf. the previous discussion) Rx = Q T b. fasshauer@iit.edu MATH 532 82
Gram Schmidt Orthogonalization & QR Factorization Remark This is the same as the QR factorization applied to a square and consistent system Ax = b. Summary The QR factorization provides a simple and efficient way to solve least squares problems. The ill-conditioned matrix A T A is never computed. If it is required, then it can be computed from R as R T R (in fact, this is the Cholesky factorization) of A T A. fasshauer@iit.edu MATH 532 83
Gram Schmidt Orthogonalization & QR Factorization Modified Gram Schmidt There is still a problem with the QR factorization via Gram Schmidt: it is not numerically stable (see HW). A better but still not ideal approach is provided by the modified Gram Schmidt algorithm. Idea: rearrange the order of calculation, i.e., write the projection matrices k 1 U k U T k = u i u T i i=1 as a sum of rank-1 projections. fasshauer@iit.edu MATH 532 84
Gram Schmidt Orthogonalization & QR Factorization MGS Algorithm k=1: u 1 x 1 x 1, u j x j, j = 2,..., n for k = 2 : n E k = I u k 1 u T k 1 for j = k,..., n u j E k u j u k u k u k fasshauer@iit.edu MATH 532 85
Gram Schmidt Orthogonalization & QR Factorization Remark The MGS algorithm is theoretically equivalent to the GS algorithm, i.e., in exact arithmetic, but in practice it preserves orthogonality better. Most stable implementations of the QR factorization use Householder reflections or Givens rotations (more later). Householder reflections are also more efficient than MGS. fasshauer@iit.edu MATH 532 86
Unitary and Orthogonal Matrices Unitary and Orthogonal Matrices [0] 1 Vector Norms Definition 2 Matrix Norms A real (complex) n n matrix is called orthogonal (unitary) if its columns form an orthonormal basis for R n (C n ). 3 Inner Product Spaces 4 Orthogonal Vectors Theorem Let 5 UGram Schmidt be an orthogonal Orthogonalization & n QR Factorization n matrix. Then 1 U has orthonormal rows. 6 Unitary and Orthogonal Matrices 2 U 1 = U T. 7 Orthogonal Reduction 3 Ux 2 = x 2 for all x R n, i.e., U is an isometry. 8 Complementary Subspaces Remark 9 Orthogonal Decomposition Analogous properties for unitary matrices are formulated and proved in 10 [Mey00]. Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 88
Unitary and Orthogonal Matrices Proof 2 By definition U = ( u 1 u n ) has orthonormal columns, i.e., But U T U = I implies U T = U 1. u i u j u T i u j = δ ij ( ) U T U = δ ij U T U = I. 1 Therefore the statement about orthonormal rows follows from UU 1 = UU T = I. ij fasshauer@iit.edu MATH 532 89
Unitary and Orthogonal Matrices Proof (cont.) 3 To show that U is an isometry we assume U is orthogonal. Then, for any x R n Ux 2 2 = (Ux)T (Ux) = x T U }{{} T U x =I = x 2 2. fasshauer@iit.edu MATH 532 90
Unitary and Orthogonal Matrices Remark The converse of (3) is also true, i.e., if Ux 2 = x 2 for all x R n then U must be orthogonal. Consider x = e i. Then Ue i 2 2 = ut i u i (3) = e i 2 2 = 1, so the columns of U have norm 1. Moreover, for x = e i + e j (i j) we get U(e i + e j ) 2 2 = ut i u }{{} i =1 +u T i u j + u T j u i + u T j u j }{{} =1 (3) = e i + e j 2 2 = 2, so that u T i u j = 0 for i j and the columns of U are orthogonal. fasshauer@iit.edu MATH 532 91
Unitary and Orthogonal Matrices Example The simplest orthogonal matrix is the identity matrix I. Permutation matrices are orthogonal, e.g., 1 0 0 P = 0 0 1 0 1 0 In fact, for permutation matrices we even have P T = P so that P T P = P 2 = I. Such matrices are called involutary (see pretest). An orthogonal matrix can be viewed as a unitary matrix, but a unitary matrix may not be orthogonal. For example for A = 1 ( ) 1 i 2 1 i we have A A = AA = I, but A T A I AA T. fasshauer@iit.edu MATH 532 92
Unitary and Orthogonal Matrices Elementary Orthogonal Projectors Definition A matrix Q of the form Q = I uu T, u R n, u 2 = 1, is called an elementary orthogonal projection. Remark Note that Q is not an orthogonal matrix: Q T = (I uu T ) T = I uu T = Q. fasshauer@iit.edu MATH 532 93
Unitary and Orthogonal Matrices All projectors are idempotent, i.e., Q 2 = Q: Q T Q above = Q 2 = (I uu T )(I uu T ) = I 2uu T + u u }{{} T u =1 = (I uu T ) = Q. u T fasshauer@iit.edu MATH 532 94
Unitary and Orthogonal Matrices Geometric interpretation Consider x = (I Q)x + Qx and observe that (I Q)x Qx: ((I Q)x) T Qx = x T (I Q T )Qx Also, = x T (Q Q }{{ T Q } )x = 0. =Q (I Q)x = (uu T )x = u(u T x) span{u}. Therefore Qx u, the orthogonal complement of u. Also note that (u T x)u = u T x u }{{ 2, so that u T x is the length of } =1 the orthogonal projection of x onto span{u}. fasshauer@iit.edu MATH 532 95
Unitary and Orthogonal Matrices Summary (I Q)x span{u}, so I Q = uu T = P u is a projection onto span{u}. Qx u, so Q = I uu T = P u is a projection onto u. fasshauer@iit.edu MATH 532 96
Unitary and Orthogonal Matrices Remark Above we assumed that u 2 = 1. For an arbitrary vector v we get a unit vector u = Therefore, for general v P v = vvt v T v is a projection onto span{v}. P v = I P v = I vvt v T v is a projection onto v. v v 2 = v. v T v fasshauer@iit.edu MATH 532 97
Unitary and Orthogonal Matrices Elementary Reflections Definition Let v( 0) R n. Then R = I 2 vvt v T v is called the elementary (or Householder) reflector about v. Remark For u R n with u 2 = 1 we have R = I 2uu T. fasshauer@iit.edu MATH 532 98
Unitary and Orthogonal Matrices Geometric interpretation Consider u 2 = 1, and note that Qx = (I uu T )x is the orthogonal projection of x onto u as above. Also, Q(Rx) = Q(I 2uu T )x = Q (I 2(I Q)) x = (Q 2Q + 2 }{{} Q 2 )x = Qx, =Q so that Qx is also the orthogonal projection of Rx onto u. Moreover, x Qx = x (I uu T )x = u T x u = u T x and Qx Rx = (Q R)x = ( I uu T (I 2uu T ) ) x = uu T x = u T x. Together, Rx is the reflection of x about u. fasshauer@iit.edu MATH 532 99
Unitary and Orthogonal Matrices Properties of elementary reflections Theorem Let R be an elementary reflector. Then R 1 = R T = R, i.e., R is orthogonal, symmetric, and involutary. Remark However, these properties do not characterize a reflection, i.e., an orthogonal, symmetric and involutary matrix is not necessarily a reflection (see HW). fasshauer@iit.edu MATH 532 100
Unitary and Orthogonal Matrices Proof. R T = (I 2uu T ) T = I 2uu T = R. Also, so that R 1 = R. R 2 = (I 2uu T )(I 2uu T ) = I 4uu T + 4u u }{{} T u u T = I, =1 fasshauer@iit.edu MATH 532 101
Unitary and Orthogonal Matrices Reflection of x onto e 1 If we can construct a matrix R such that Rx = αe 1, then we can use R to zero out entries in (the first column of) a matrix. To this end consider { 1 if x1 real, and note v = x ± µ x 2 e 1, where µ = x 1 x 1 if x 1 complex, v T v = (x ± µ x 2 e 1 ) T (x ± µ x 2 e 1 ) = x T x ± 2µ x 2 e T 1 x + }{{} µ2 x 2 2 =1 = 2(x T x ± µ x e T 1 x) = 2v T x. (9) fasshauer@iit.edu MATH 532 102
Unitary and Orthogonal Matrices Our Householder reflection was defined as so that R = I 2 vvt v T v Rx = x 2 vvt x v T v = x v = µ x }{{ 2 e } 1. =α = x 2v T x v T v }{{ v} (9) =1 Remark These special reflections are used in the Householder variant of the QR factorization. For optimal numerical stability of real matrices one lets µ = sign(x 1 ). fasshauer@iit.edu MATH 532 103
Unitary and Orthogonal Matrices Remark Since R 2 = I (R 1 = R) we have whenever x 2 = 1 Rx = µe 1 = R 2 x = µre 1 x = µr 1. Therefore the matrix U = R (taking µ = 1) is orthogonal (since R is) and contains x as its first column. Thus, this allows us to construct an ON basis for R n that contains x (see example in [Mey00]. fasshauer@iit.edu MATH 532 104
Rotations Unitary and Orthogonal Matrices We give only a brief overview (more details can be found in [Mey00]). We begin in R 2 and look for a matrix representation of the rotation of a vector u into another vector v, counterclockwise by an angle θ: Here u = v = ( u1 u 2 ( v1 v 2 ) ( ) u cos φ = u sin φ ) ( v cos(φ + θ) = v sin(φ + θ) (10) ) (11) fasshauer@iit.edu MATH 532 105
Unitary and Orthogonal Matrices We use the trigonometric identities cos(a + B) = cos A cos B sin A sin B sin(a + B) = sin A cos B + sin B cos A with A = φ, B = θ and v = u to get ( ) v (11) v cos(φ + θ) = v sin(φ + θ) ( ) u (cos φ cos θ sin φ sin θ) = u (sin φ cos θ + sin θ cos φ) ( ) (10) u1 cos θ u = 2 sin θ u 2 cos θ + u 1 sin θ ( ) cos θ sin θ = u = Pu, sin θ cos θ where P is the rotation matrix. fasshauer@iit.edu MATH 532 106
Unitary and Orthogonal Matrices Remark Note that ( ) ( ) P T cos θ sin θ cos θ sin θ P = sin θ cos θ sin θ cos θ ( cos = 2 θ + sin 2 ) θ cos θ sin θ + cos θ sin θ cos θ sin θ + cos θ sin θ sin 2 θ + cos 2 θ = I, so that P is an orthogonal matrix. P T is also a rotation matrix (by an angle θ). fasshauer@iit.edu MATH 532 107
Unitary and Orthogonal Matrices Rotations about a coordinate axis in R 3 are very similar. Such rotations are referred to a plane rotations. For example, rotation about the x-axis (in the yz-plane) is accomplished with 1 0 0 P yz = 0 cos θ sin θ 0 sin θ cos θ Rotation about the y and z-axes is done analogously. fasshauer@iit.edu MATH 532 108
Unitary and Orthogonal Matrices We can use the same ideas for plane rotations in higher dimensions. Definition An orthogonal matrix of the form 1 P ij =... c s 1... s c 1... i j 1 i j with c 2 + s 2 = 1 is called a plane rotation (or Givens rotation). Note that the orientation is reversed from the earlier discussion. fasshauer@iit.edu MATH 532 109
Unitary and Orthogonal Matrices Usually we set c = x i, s = xi 2 + xj 2 since then for x = ( x 1 x n ) T x 1. cx i + sx j P ij x =. = sx i + cx j.. x n x j xi 2 + xj 2 x 1. x 2 i +x 2 j x 2 +x i j 2.. 0. x n This shows that P ij zeros the j th component of x. fasshauer@iit.edu MATH 532 110
Unitary and Orthogonal Matrices Note that xi 2 +xj 2 xi 2 +xj 2 = xi 2 + xj 2 so that repeatedly applying Givens rotations P ij with the same i, but different values of j will zero out all but the i th component of x, and that component will become x 2 1 +... + x 2 n = x 2. Therefore, the sequence P = P in P i,i+1 P i,i 1 P i1 of Givens rotations rotates the vector x R n onto e i, i.e., Px = x 2 e i. Moreover, the matrix P is orthogonal. fasshauer@iit.edu MATH 532 111
Unitary and Orthogonal Matrices Remark Givens rotations can be used as an alternative to Householder reflections to construct a QR factorization. Householder reflections are in general more efficient, but for sparse matrices Givens rotations are more efficient because they can be applied more selectively. fasshauer@iit.edu MATH 532 112
Orthogonal Reduction Orthogonal Reduction [0] 1 Vector Norms 2 Matrix Norms Recall the form of LU factorization (Gaussian elimination): 3 Inner Product Spaces 4 Orthogonal Vectors T n 1 T 2 T 1 A = U, where 5 T k are lower triangular and U is upper triangular, i.e., we have a Gram Schmidt Orthogonalization & QR Factorization triangular reduction. 6 Unitary and Orthogonal Matrices For the QR factorization we will use orthogonal Householder reflectors 7 Orthogonal Reduction R k to get R n 1 R 2 R 1 A = T, 8 Complementary Subspaces where T is upper triangular, i.e., we have an orthogonal reduction. 9 Orthogonal Decomposition 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 114
Orthogonal Reduction Recall Householder reflectors so that and µ = 1 for x real. R = I 2 vvt v T v, with v = x ± µ x e 1, Rx = µ x e 1 Now we explain how to use these Householder reflectors to convert an m n matrix A to an upper triangular matrix of the same size, i.e., how to do a full QR factorization. fasshauer@iit.edu MATH 532 115
Orthogonal Reduction Apply Householder reflector to the first column of A: R 1 A 1 = ) (I 2 vvt v T A 1 with v = A 1 ± A 1 e 1 v t 11 0 = A 1 e 1 =. 0 Then, R 1 applied to all of A yields t 11 t 12 t 1n 0 R 1 A =... = 0 ( t11 t T ) 1 0 A 2 fasshauer@iit.edu MATH 532 116
Orthogonal Reduction Next, we apply the same idea to A 2, i.e., we let ( 1 0 T ) R 2 = 0 ˆR 2 Then R 2 R 1 A = ( t11 t T ) 1 = 0 ˆR2 A 2 t 11 t 12 t 1n t 22 t T 2 0 0 A 3 fasshauer@iit.edu MATH 532 117
Orthogonal Reduction We continue the process until we get an upper triangular matrix, i.e., t 11... R n R 2 R }{{ 1 A =. } O t nn whenever m > n =P O or t 11 R m R 2 R }{{ 1 A = } =P O.... t mm whenever n > m Since each R k is orthogonal (unitary for complex A) we have PA = T with P m m orthogonal and T m n upper triangular, i.e., A = QR (Q = P T, R = T) fasshauer@iit.edu MATH 532 118
Orthogonal Reduction Remark This is similar to obtaining the QR factorization via MGS, but now Q is orthogonal (square) and R is rectangular. This gives us the full QR factorization, whereas MGS gave us the reduced QR factorization (with m n Q and n n R). fasshauer@iit.edu MATH 532 119
Orthogonal Reduction Example We use Householder reflections to find the QR factorization (where R has positive diagonal elements) of 1 2 0 A = 0 1 1. 1 0 1 so that R 1 = I 2 v 1v T 1 v T 1 v, with v 1 = A 1 ± A 1 e 1 1 2 R 1 A = A 1 e 1 = 0. 0 Thus we take the ± sign as so that t 11 = 2 > 0. fasshauer@iit.edu MATH 532 120
Orthogonal Reduction Example ((cont.)) To find R 1 A we can either compute R 1 using the formula above and then compute the matrix-matrix product, or more cheaply note that ( ) R 1 x = I 2 v 1v T 1 v T 1 v 1 x = x 2v T 1 x v 1 v T 1 v, 1 so that we can compute v T 1 A j, j = 2, 3, instead of the full R 1. v T 1 A 2 = (1 2) 2 + 0 1 + 1 0 = 2 2 2 v T 1 A 3 = (1 2) 0 + 0 1 + 1 1 = 1 Also 2 v 1 v T 1 v 1 = 1 2 1 2 0 2 1 fasshauer@iit.edu MATH 532 121
Orthogonal Reduction Example ((cont.)) Therefore 2 R 1 A 2 = 1 2 2 2 0 2 1 2 0 = }{{ 2 } 1 R 1 A 3 = 2 2 1 2 2 = 2 ( ) 2 1 2 so that 2 2 2 2 R 1 A = 0 1 1, A 2 = 0 2 2 2 ( ) 1 1 2 2 2 fasshauer@iit.edu MATH 532 122
Orthogonal Reduction Example ((cont.)) Next ˆR 2 x = x 2v T 2 x v 2 v T 2 v 2 with v 2 = (A 2 ) 1 (A 2 ) 1 e 1 = v T 2 (A 2) 1 = 3 3, v T 2 (A 2) 2 = 3, 2 v 2 v T 2 v 2 so Using R 2 = 1 = 3 3 ( ) ( ) 3 0 ˆR 2 (A 2 ) 1 =, ˆR2 (A 0 2 ) 2 = 6 2 ( 1 0 T ) 0 ˆR we get 2 2 2 2 R 2 R }{{ 1 A = } =P 0 3 2 0 = T 0 0 6 2 ( ) 1 3 2 ( ) 1 3 2 fasshauer@iit.edu MATH 532 123
Orthogonal Reduction Remark As mentioned earlier, the factor R of the QR factorization is given by the matrix T. The factor Q = P T is not explicitly given in the example. One could also obtain the same answer using Givens rotations (compare [Mey00, Example 5.7.2]). fasshauer@iit.edu MATH 532 124
Orthogonal Reduction Theorem Let A be an n n nonsingular real matrix. Then the factorization A = QR with n n orthogonal matrix Q and n n upper triangular matrix R with positive diagonal entries is unique. Remark In this n n case the reduced and full QR factorizations coincide, i.e., the results obtained via Gram Schmidt, Householder and Givens should be identical. fasshauer@iit.edu MATH 532 125
Orthogonal Reduction Proof Assume we have two QR factorizations A = Q 1 R 1 = Q 2 R 2 Q T 2 Q 1 = R 2 R 1 1 = U. Now, R 2 R 1 1 is upper triangular with positive diagonal (since each factor is) and Q T 2 Q 1 is orthogonal. Therefore U has all of these properties. Since U is upper triangular u 11 0 U 1 =.. 0 Moreover, since U is orthogonal u 11 = 1. fasshauer@iit.edu MATH 532 126
Orthogonal Reduction Proof (cont.) Next, u 12 u 22 U T 1 U 2 = ( 1 0 0 ) 0. 0 = u 12 = 0 since the columns of U are orthogonal, and the fact that U 2 = 1 implies u 22 = 1. Comparing all the other pairs of columns of U shows that U = I, and therefore Q 1 = Q 2 and R 1 = R 2. fasshauer@iit.edu MATH 532 127
Orthogonal Reduction Recommendations (so far) for solution of Ax = b 1 If A is square and nonsingular, then use LU factorization with partial pivoting. This is stable for most practical problems and requires O( n3 3 ) operations. 2 To find a least square solution, use QR factorization: Ax = b QRx = b Rx = Q T b. Usually the reduced QR factorization is all that s needed. fasshauer@iit.edu MATH 532 128
Orthogonal Reduction Even though (for square nonsingular A) the Gram Schmidt, Householder and Givens versions of the QR factorization are equivalent (due to the uniqueness theorem), we have for general A that classical GS is not stable, modified GS is stable for least squares, but unstable for QR (since it has problems maintaining orthogonality), Householder and Givens are stable, both for least squares and QR fasshauer@iit.edu MATH 532 129
Orthogonal Reduction Computational cost (for n n matrices) LU with partial pivoting: O( n3 3 ) Gram Schmidt: O(n 3 ) Householder: O( 2n3 3 ) Givens: O( 4n3 3 ) Householder reflections are often the preferred method since they provide both stability and also decent efficiency. fasshauer@iit.edu MATH 532 130
Complementary Subspaces Complementary Subspaces [0] Definition 1 Vector Norms Let V be a vector space and X, Y V be subspaces. X and Y are 2 Matrix Norms called complementary provided 3 Inner Product Spaces 4 Orthogonal Vectors V = X + Y and X Y = {0}. In this case, V is also called the direct sum of X and Y, and we write 5 Gram Schmidt Orthogonalization & QR Factorization 6 Unitary and Orthogonal Matrices Example 7 Orthogonal Reduction V = X Y. Any two lines through the origin in R 2 are complementary. 8 Complementary Subspaces Any plane through the origin in R 3 is complementary to any line through the origin not contained in the plane. 9 Orthogonal Decomposition Two planes through the origin in R 3 are not complementary since they must intersect in a line. 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 132
Complementary Subspaces Theorem Let V be a vector space, and X, Y V be subspaces with bases B X and B Y. The following are equivalent: 1 V = X Y. 2 For every v V there exist unique x X and y Y such that v = x + y. 3 B X B Y = {} and B X B Y is a basis for V. Proof. See [Mey00]. Definition Suppose V = X Y, i.e., any v V can be uniquely decomposed as v = x + y. Then 1 x is called the projection of v onto X along Y. 2 y is called the projection of v onto Y along X. fasshauer@iit.edu MATH 532 133
Complementary Subspaces Properties of projectors Theorem Let X, Y be complementary subspaces of V. Let P, defined by Pv = x, be the projector onto X along Y. Then 1 P is unique. 2 P 2 = P, i.e., P is idempotent. 3 I P is the complementary projector (onto Y along X ). 4 R(P) = {x : Px = x} = X ( fixed points for P). 5 N(I P) = X = R(P) and R(I P) = N(P) = Y. 6 If V = R n (or C n ), then P = ( X O ) ( X Y ) 1 = ( X Y ) ( ) I O (X ) 1 Y, O O where the columns of X and Y are bases for X and Y. fasshauer@iit.edu MATH 532 134
Complementary Subspaces Proof 1 Assume P 1 v = x = P 2 v for all v V. But then P 1 = P 2. 2 We know Pv = x for every v V so that P 2 v = P(Pv) = Px = x. Together we therefore have P 2 = P. 3 Using the unique decomposition of v we can write v = x + y = Pv + y (I P)v = y, the projection of v onto Y along X. fasshauer@iit.edu MATH 532 135
Complementary Subspaces Proof (cont.) 4 Note that x R(P) if and only if x = Px. This is true since if x = Px then x obviously in R(P). On the other hand, if x R(P) then x = Pv for some v V and so Therefore Px = P 2 v (2) = Pv = x. R(P) = {x : x = Pv, v V} = X = {x : Px = x}. 5 Since N(I P) = {x : (I P)x = 0}, and (I P)x = 0 Px = x we have N(I P) = X = R(P). The claim R(I P) = Y = N(P) is shown similarly. fasshauer@iit.edu MATH 532 136
Complementary Subspaces Proof (cont.) 6 Take B = ( X Y ), where the columns of X and Y form a basis for X and Y, respectively. Then the columns of B form a basis for V and B is nonsingular. From above we have Px = x, where x can be any column of X. Also, Py = 0, where y is any column of Y. So PB = P ( X Y ) = ( X O ) or P = ( X O ) B 1 = ( X Y ) 1. This establishes the first part of (6). The second part follows by noting that ( ) I O B = ( X Y ) ( ) I O = ( X O ). O O O O fasshauer@iit.edu MATH 532 137
Complementary Subspaces We just saw that any projector is idempotent, i.e., P 2 = P. In fact, Theorem A matrix P is a projector if and only if P 2 = P. Proof. One direction is given above. For the other see [Mey00]. Remark This theorem is sometimes used to define projectors. fasshauer@iit.edu MATH 532 138
Complementary Subspaces Angle between subspaces In some applications, e.g., when determining the convergence rates of iterative algorithms, it is useful to know the angle between subspaces. If R, N are complementary then sin θ = 1 P 2 = 1 λ max = 1 σ 1, where P is the projector onto R along N, λ max is the largest eigenvalue of P T P and σ 1 is the largest singular value of P. See [Mey00, Example 5.9.2] for more details. fasshauer@iit.edu MATH 532 139
Complementary Subspaces Remark We will skip [Mey00, Section 5.10] on the range nullspace decomposition. While the range nullspace decomposition is theoretically important, its practical usefulness is limited because computation is very unstable due to lack of orthogonality. This also means we will not discuss nilpotent matrices and later on the Jordan normal form. fasshauer@iit.edu MATH 532 140
Orthogonal Decomposition Definition [0] Let V be an inner product space and M V. The orthogonal 1 Vector Norms complement M of M is 2 Matrix Norms 3 Inner Product Spaces M = {x V : m, x = 0 for all m M}. 4 Orthogonal Vectors 5 Gram Schmidt Orthogonalization & QR Factorization 6 Unitary and Orthogonal Matrices 7 Orthogonal Reduction 8 Complementary Subspaces 9 Orthogonal Decomposition Remark 10 Singular Value Decomposition Even if M is not a subspace of V (i.e., only a subset), M is (see HW). 11 Orthogonal Projections fasshauer@iit.edu MATH 532 142
Orthogonal Decomposition Theorem Let V be an inner product space and M V. If M is a subspace of V, then V = M M. Proof According to the definition of complementary subspaces we need to show 1 M M = {0}, 2 M + M = V. fasshauer@iit.edu MATH 532 143
Orthogonal Decomposition Proof (cont.) 1 Let s assume there exists an x M M, i.e., x M and x M. The definition of M implies x, x = 0. But then the definition of an inner product implies x = 0. This is true for any x M M, so x = 0 is the only such vector. fasshauer@iit.edu MATH 532 144
Orthogonal Decomposition Proof (cont.) 2 We let B M and B M be ON bases for M and M, respectively. Since M M = {0} we know that B M B M is an ON basis for some S V. In fact, S = V since otherwise we could extend B M B M to an ON basis of V (using the extension theorem and GS). However, any vector in the extension must be orthogonal to M, i.e., in M, but this is not possible since the extended basis must be linearly independent. Therefore, the extension set is empty. fasshauer@iit.edu MATH 532 145
Orthogonal Decomposition Theorem Let V be an inner product space with dim(v) = n and M be a subspace of V. Then 1 dim M = n dim M, 2 M = M. Proof For (1) recall our dimension formula from Chapter 4 dim(x + Y) = dim X + dim Y dim(x Y). Here M M = {0}, so that dim(m M ) = 0. Also, since M is a subspace of V we have V = M + M and the dimension formula implies (1). fasshauer@iit.edu MATH 532 146
Orthogonal Decomposition Proof (cont.) 2 Instead of directly establishing equality we first show that M M. Since M M = V any x V can be uniquely decomposed into x = m + n with m M, n M. Now we take x M so that x, n = 0 for all n M, and therefore But 0 = x, n = m + n, n = m, n + n, n. }{{} =0 and therefore x = m is in M. n, n = 0 n = 0, fasshauer@iit.edu MATH 532 147
Orthogonal Decomposition Proof (cont.) Now, recall from Chapter 4 that for subspaces X Y dim X = dim Y = X = Y. We take X = M and Y = M (and know from the work just performed that M is a subspace of M). From (1) we know But then M = M. dim M = n dim M dim M = n dim M = n (n dim M) = dim M. fasshauer@iit.edu MATH 532 148
Orthogonal Decomposition Back to Fundamental Subspaces Theorem Let A be a real m n matrix. Then 1 R(A) = N(A T ), 2 N(A) = R(A T ). Corollary R m = R(A) R(A) = R(A) N(A T ), }{{} R m R n = N(A) N(A) = N(A) R(A T ). }{{} R n fasshauer@iit.edu MATH 532 149
Orthogonal Decomposition Proof (of Theorem) 1 We show that x R(A) implies x N(A T ) and vice versa. x R(A) Ay, x = 0 for any y R n y T A T x = 0 for any y R n y, A T x = 0 for any y R n A T x = 0 x N(A T ) by the definitions of these subspaces and of an inner product. 2 Using (1), we have R(A) (1) = N(A T ) R(A) = N(A T ) A A T R(A T ) = N(A). fasshauer@iit.edu MATH 532 150
Orthogonal Decomposition Starting to think about the SVD The decompositions of R m and R n from the corollary help prepare for the SVD of an m n matrix A. Assume rank(a) = r and let B R(A) = {u 1,..., u r } ON basis for R(A) R m, B N(A T ) = {u r+1,..., u m } ON basis for N(A T ) R m, B R(A T ) = {v 1,..., v r } ON basis for R(A T ) R n, B N(A) = {v r+1,..., v n } ON basis for N(A) R n. By the corollary B R(A) B N(A T ) ON basis for R m, B R(A T ) B N(A) ON basis for R n, and therefore the following are orthogonal matrices U = ( u 1 u 2 u m ) V = ( v 1 v 2 v n ). fasshauer@iit.edu MATH 532 151
Orthogonal Decomposition Consider Note that R = U T AV = (u T i Av j ) m,n i,j=1. Av j = 0, j = r + 1,..., n, u T i A = 0 A T u i = 0, i = r + 1,..., m, so u T 1 Av 1 u T 1 Av r ( ) R =.. O u T r Av 1 u T r Av r = Cr r O. O O O O fasshauer@iit.edu MATH 532 152
Orthogonal Decomposition Thus ( ) R = U T Cr r O AV = O O ( ) A = URV T Cr r O = U V T, O O the URV factorization of A. Remark The matrix C r r is nonsingular since rank(c) = rank(u T AV) = rank(a) = r because multiplication by the orthogonal (and therefore nonsingular) matrices U T and V does not change the rank of A. fasshauer@iit.edu MATH 532 153
Orthogonal Decomposition We have now shown that the ON bases for the fundamental subspaces of A yield the URV factorization. As we show next, the converse is also true, i.e., any URV factorization of A yields a ON bases for the fundamental subspaces of A. However, the URV factorization is not unique. Different ON bases result in different factorizations. fasshauer@iit.edu MATH 532 154
Orthogonal Decomposition Consider A = URV T with ( U, V) orthogonal m m and n n matrices, C O respectively, and R = with C nonsingular. O O We partition ( ) ( ) U1 U U = }{{}}{{} 2 V1 V, V = }{{}}{{} 2 m r m m r n r n n r Then V (and therefore also V T ) is nonsingular and we see that R(A) = R(URV T ) = R(UR) = R (( U 1 C O )) = R(U 1 C }{{} m r ) rank(c)=r so that the columns of U 1 are an ON basis for R(A). = R(U 1 ) (12) fasshauer@iit.edu MATH 532 155
Orthogonal Decomposition Moreover, N(A T ) prev. thm = R(A) (12) = R(U 1 ) = R(U 2 ) since U is orthogonal and R m = R(U 1 ) R(U 2 ). This implies that the columns of U 2 are an ON basis for N(A T ). The other two cases can be argued similarly using N(AB) = N(B) provided rank(a) = n. fasshauer@iit.edu MATH 532 156
Orthogonal Decomposition The main difference between a URV factorization and the SVD is that the SVD will contain a diagonal matrix Σ with r nonzero singular values, while R contains the full r r block C. As a first step in this direction, we can easily obtain a URV factorization of A with a lower triangular matrix C. Idea: use Householder reflections (or Givens rotations) fasshauer@iit.edu MATH 532 157
Orthogonal Decomposition Consider an m n matrix A. We apply an m m orthogonal (Householder reflection) matrix P so that ( ) B A PA =, with r m matrix B, rank(b) = r. O Next, use n n orthogonal Q as follows: ( ) B T QB T T =, with r r upper triangular T, rank(t) = r. O Then and BQ T = ( T T O ) B = ( T T O ) Q ( ) B = O ( ) T T O Q. O O fasshauer@iit.edu MATH 532 158
Orthogonal Decomposition Together, ( ) T T O PA = Q O O ( ) A = P T T T O Q, O O a URV factorization with lower triangular block T T. Remark See HW for an example of this process with numbers. fasshauer@iit.edu MATH 532 159
Singular Value Decomposition Singular Value Decomposition [0] 1 Vector Norms 2 Matrix Norms 3 Inner Product Spaces We know 4 Orthogonal Vectors 5 Gram Schmidt Orthogonalization & QR Factorization ( ) A = URV T C O = U V T, O O where C is upper triangular and U, V are orthogonal. 6 Unitary and Orthogonal Matrices 7 Orthogonal Reduction Now we want to establish that C can even be made diagonal. 8 Complementary Subspaces 9 Orthogonal Decomposition 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 161
Singular Value Decomposition Note that A 2 = C 2 =: σ 1 since multiplication by an orthogonal matrix does not change the 2-norm (see HW). Also, C 2 = max z 2 =1 Cz 2 so that C 2 = Cx 2 for some x, x 2 = 1. In fact (see Sect.5.2), x is such that (C T C λi)x = 0, i.e., x is an eigenvector of C T C so that C 2 = σ 1 = λ = x T C T Cx. (13) fasshauer@iit.edu MATH 532 162
Singular Value Decomposition Since x is a unit vector we can extend it to an orthogonal matrix R x = ( x X ), e.g., using Householder reflectors as discussed at the end of Sect.5.6. Similarly, let y = Cx = Cx. (14) Cx 2 σ 1 Then R y = ( y Y ) is also orthogonal (and Hermitian/symmetric) since it s a Householder reflector. fasshauer@iit.edu MATH 532 163
Singular Value Decomposition Now R T y }{{} =R y CR x = ( y T Y T ) C ( x X ) = ( y T Cx y T ) CX Y T Cx Y T. CX From above σ 2 1 = λ (13) = x T C T Cx (14) = σ 1 y T Cx = y T Cx = σ 1. Also, Y T Cx (14) = Y T (σ 1 y) = 0 since R y is orthogonal, i.e., y is orthogonal to the columns of Y. fasshauer@iit.edu MATH 532 164
Singular Value Decomposition Let Y T CX = C 2 and y T CX = c T so that ( σ1 c R y CR x = T ). 0 C 2 To show that c T = 0 T consider c T = y T CX (14) = ( ) Cx T CX σ 1 From (13) x is an eigenvector of C T C, i.e., = x T C T CX σ 1. (15) C T Cx = λx = σ 2 1 x x T C T C = σ 2 1 x T. Plugging this into (15) yields c T = σ 1 x T X = 0 since R x = ( x X ) is orthogonal. fasshauer@iit.edu MATH 532 165
Singular Value Decomposition Moreover, σ 1 C 2 2 since σ 1 = C 2 HW = Ry CR x 2 = max{σ 1, C 2 2 }. Next, we repeat this process for C 2, i.e., ( σ2 0 T ) S y C 2 S x = with σ 0 C 2 C 3 2. 3 Let Then ( 1 0 T P 2 = 0 S T y ) ( ) R T 1 0 T y, Q 2 = R x. 0 S x σ 1 0 0 T P 2 CQ 2 = 0 σ 2 0 T with σ 1 σ 2 C 3 2. 0 0 C 3 fasshauer@iit.edu MATH 532 166
We continue this until P r 1 CQ r 1 = O Finally, let Together, Singular Value Decomposition σ 1 σ 2... O = D, σ 1 σ 2... σ r. σ r ( ) ( ) Ũ T Pr 1 O = U T, and O I Ṽ = Qr 1 O. O I ( Ũ T D AṼ = O ) O O or without the tildes the singular value decomposition (SVD) of A ( ) D O A = U V T, O O where A is m n, U is m m, D = r r and V = n n. fasshauer@iit.edu MATH 532 167
Singular Value Decomposition We use the following terminology: singular values: σ 1 σ 2... σ r > 0, left singular vectors: columns of U, right singular vectors: columns of V. Remark In Chapter 7 we will see that the columns of U and V are also special eigenvectors of A T A. fasshauer@iit.edu MATH 532 168
Singular Value Decomposition Geometric interpretation of SVD For the following we assume A R n n, n = 2. 1 2 0.5 σ 2 v 2 1 σ 1 u 1 2 1 0 1 2 1 0.5 0.5 1 v 2 1 v 1 0.5 2 This picture is true since 1 A = UDV T AV = UD and σ 1, σ 2 are the lengths of the semi-axes of the ellipse because u 1 = u 2 = 1. Remark See [Mey00] for more details. fasshauer@iit.edu MATH 532 169
Singular Value Decomposition For general n, A transforms the 2-norm unit sphere to an ellipsoid whose semi-axes have lengths Therefore, σ 1 σ 2... σ n. κ 2 (A) = σ 1 σ n is the distortion ratio of the transformation A. Moreover, 1 σ 1 = A 2, σ n = A 1 2 so that κ 2 (A) = A 2 A 1 2 is the 2-norm condition number of A ( R n n ). fasshauer@iit.edu MATH 532 170
Singular Value Decomposition Remark The relations for σ 1 and σ n hold because A 2 = UDV T 2 HW = D 2 = σ 1 A 1 2 = VD 1 U T 2 HW = D 1 2 = 1 σ n Remark We always have κ 2 (A) 1, and κ 2 (A) = 1 if and only if A is a multiple of an orthogonal matrix (typo in [Mey00], see proof on next slide). fasshauer@iit.edu MATH 532 171
Singular Value Decomposition Proof = : Assume A = αq with α > 0, Q orthogonal, i.e., Also A 2 = α Q 2 = α max Qx invariance 2 = α max x 2 = α. x 2 =1 x 2 =1 A T A = α 2 Q T Q = α 2 I = A 1 = 1 α 2 AT and A T 2 = A 2 so that A 1 2 = 1 α and κ 2 (A) = A 2 A 1 2 = α 1 α = 1. fasshauer@iit.edu MATH 532 172
Singular Value Decomposition Proof (cont.) = : Assume κ 2 (A) = σ 1 σ n = 1 so that σ 1 = σ n and therefore D = σ 1 I. Thus and A = UDV T = σ 1 UV T A T A = σ 2 1 (UVT ) T UV T = σ 2 1 VUT UV T = σ 2 1 I. fasshauer@iit.edu MATH 532 173
Singular Value Decomposition Applications of the Condition Number Let x be the answer obtained by solving Ax = b with A R n n. Is a small residual r = b A x a good indicator for the accuracy of x? Since x is the exact answer, and x the computed answer we have the relative error x x. x fasshauer@iit.edu MATH 532 174
Singular Value Decomposition Now r = b A x = Ax A x = A(x x) A x x. To get the relative error we multiply by A 1 b x = 1. Then r A A 1 b x x x r b κ(a) x x. (16) x fasshauer@iit.edu MATH 532 175
Singular Value Decomposition Moreover, using r = b A x = b b, Multiplying by Ax b x x = A 1 (b b) A 1 r. = 1 we have x x x κ(a) r b. (17) Combining (16) and (17) yields 1 r κ(a) b x x x κ(a) r b. Therefore, the relative residual r b is a good indicator of relative error if and only if A is well conditioned, i.e., κ(a) is small (close to 1). fasshauer@iit.edu MATH 532 176
Singular Value Decomposition Applications of the SVD 1 Determination of numerical rank(a) : rank(a) index of smallest singular value greater or equal a desired threshold 2 Low-rank approximation of A: The Eckart Young theorem states that A k = k σ i u i v T i is the best rank k approximation to A in the 2-norm (also the Frobenius norm), i.e., A A k 2 = i=1 min A B 2. rank(b)=k Moreover, Run SVD_movie.m A A k 2 = σ k+1. fasshauer@iit.edu MATH 532 177
Singular Value Decomposition 3 Stable solution of least squares problems: Use Moore Penrose pseudoinverse Definition Let A R m n and be the SVD of A. Then ( ) D O A = U O O V T ( ) A D 1 O = V O O is called the Moore Penrose pseudoinverse of A. U T Remark Note that A R n m and A = r i=1 v i u T i σ i, r = rank(a). fasshauer@iit.edu MATH 532 178
Singular Value Decomposition We now show that the least squares solution of Ax = b is given by x = A b. fasshauer@iit.edu MATH 532 179
Singular Value Decomposition Start with normal equations and use ( ) D O A = U V T = O O ŨDṼT, the reduced SVD of A, i.e., Ũ Rm r, Ṽ Rn r. A T Ax = A T b ṼD ŨT }{{} Ũ x = ṼDŨT b =I Multiplication by D 1ṼT yields ṼD2 Ṽ T x = ṼDŨT b DṼT x = ŨT b. fasshauer@iit.edu MATH 532 180
Singular Value Decomposition Thus implies DṼT x = ŨT b x = ṼD 1 Ũ T b ( ) D 1 O x = V U T b O O x = A b. fasshauer@iit.edu MATH 532 181
Singular Value Decomposition Remark If A is nonsingular then A = A 1 (see HW). If rank(a) < n (i.e., the least squares solution is not unique), then x = A b provides the unique solution with minimum 2-norm (see justification on following slide). fasshauer@iit.edu MATH 532 182
Singular Value Decomposition Minimum norm solution of underdetermined systems Note that the general solution of Ax = b is given by z = A b + n, n N(A). Then z 2 2 = A b + n 2 2 Pythag. thm = A b 2 2 + n 2 2 A b 2 2. The Pythagorean theorem applies since (see HW) so that, using R(A T ) = N(A), A b R(A ) = R(A T ) A b n. fasshauer@iit.edu MATH 532 183
Singular Value Decomposition Remark Explicit use of the pseudoinverse is usually not recommended. Instead we solve Ax = b, A R m n, by 1 A = ŨDṼT (reduced SVD) 2 Ax = b DṼT x = ŨT b, so 1 Solve Dy = ŨT b for y 2 Compute x = Ṽy fasshauer@iit.edu MATH 532 184
Singular Value Decomposition Other Applications Also known as principal component analysis (PCA), (discrete) Karhunen-Loève (KL) transformation, Hotelling transform, or proper orthogonal decomposition (POD) Data compression Noise filtering Regularization of inverse problems Tomography Image deblurring Seismology Information retrieval and data mining (latent semantic analysis) Bioinformatics and computational biology Immunology Molecular dynamics Microarray data analysis fasshauer@iit.edu MATH 532 185
Orthogonal Projections Orthogonal Projections [0] Earlier 1 Vector we Norms discussed orthogonal complementary subspaces of an inner product space V, i.e., 2 Matrix Norms 3 Inner Product Spaces V = M M. 4 Orthogonal Vectors Definition 5 Gram Schmidt Orthogonalization & QR Factorization Consider V = M M so that for every 6 Unitary and Orthogonal Matrices v V there exist unique vectors m M, n M such that 7 Orthogonal Reduction 8 Complementary Subspaces v = m + n. Then 9 Orthogonal m is called Decomposition the orthogonal projection of v onto M. The 10 matrix Singular Value P MDecomposition such that P M v = m is the orthogonal projector onto M along M. 11 Orthogonal Projections fasshauer@iit.edu MATH 532 187
Orthogonal Projections For arbitrary complementary subspaces X, Y we showed earlier that the projector onto X along Y is given by P = ( X O ) ( X Y ) 1 = ( X Y ) ( ) I O (X ) 1 Y, O O where the columns of X and Y are bases for X and Y. fasshauer@iit.edu MATH 532 188
Orthogonal Projections Now we let X = M and Y = M be orthogonal complementary subspaces, where M and N contain the basis vectors of M and M in their columns. Then P = ( M O ) ( M N ) 1. (18) To find ( M N ) 1 we note that M T N = N T M = O and if N is an orthogonal matrix (i.e., contains an ON basis), then ( (M T M) 1 M T ) ( ) (M ) I O N = O I N T (note that M T M is invertible since M is full rank because its columns form a basis of M). fasshauer@iit.edu MATH 532 189
Orthogonal Projections Thus ( M N ) 1 = ( (M T M) 1 M T N T ). (19) Inserting (19) into (18) yields P M = ( M O ) ( (M T M) 1 M T ) N T = M(M T M) 1 M T. Remark Note that P M is unique so that this formula holds for an arbitrary basis of M (collected in M). In particular, if M contains an ON basis for M, then P M = MM T. fasshauer@iit.edu MATH 532 190
Orthogonal Projections Similarly, P M = N(N T N) 1 N T (arbitrary basis for N ) P M = NN T ON basis As before, P M = I P M. Example If M = span{u}, u = 1 then P M = P u = uu T and P u T = I P u = I uu T (cf. elementary orthogonal projectors earlier). fasshauer@iit.edu MATH 532 191
Orthogonal Projections Properties of orthogonal projectors Theorem Let P R n n be a projector, i.e., P 2 = P. Then the matrix P is an orthogonal projector if 1 R(P) N(P), 2 P T = P, 3 P 2 = 1. fasshauer@iit.edu MATH 532 192
Orthogonal Projections Proof 1 Follows directly from the definition. 2 = : Assume P is an orthogonal projector, i.e., P = M(M T M) 1 M T and P T = M (M T M) T M = P. }{{} =(M T M) 1 = : Assume P = P T. Then R(P) = R(P T ) Orth.decomp. = N(P) so that P is an orthogonal projector via (1). fasshauer@iit.edu MATH 532 193
Orthogonal Projections Proof (cont.) 3 For complementary subspaces X, Y we know the angle between X and Y is given by P 2 = 1 [0, sin θ, θ π ]. 2 Assume P is an orthogonal projector, then θ = π 2 so that P 2 = 1. Conversely, if P 2 = 1, then θ = π 2 and X, Y are orthogonal complements, i.e., P is an orthogonal projector. fasshauer@iit.edu MATH 532 194
Orthogonal Projections Why is orthogonal projection so important? Theorem Let V be an inner product space with subspace M, and let b V. Then dist(b, M) = min n M b m 2 = b P M b 2, i.e., P M b is the unique vector in M closest to b. The quantity dist(b, M) is called the (orthogonal) distance from b to M. fasshauer@iit.edu MATH 532 195
Orthogonal Projections Proof Let p = P M b. Then p M and p m M for every m M. Moreover, so that b p = (I P M )b M, (p m) (b p). Then b m 2 2 = b p + p m 2 2 Pythag. = b p 2 2 + p m 2 2 b p 2 2. Therefore min m M b m 2 = b p 2. fasshauer@iit.edu MATH 532 196
Orthogonal Projections Proof (cont.) Uniqueness: Assume there exists a q M such that Then b q 2 = b p 2. (20) b q 2 2 = b }{{ p + p q }}{{} M M 2 2 Pythag. = b p 2 2 + p q 2 2. But then (20) implies that p q 2 2 = 0 and therefore p = q. fasshauer@iit.edu MATH 532 197
Orthogonal Projections Least squares approximation revisited Now we give a modern derivation of the normal equations (without calculus), and note that much of this remains true for best L 2 approximation. Goal of least squares: For A R m n, find min m x R n i=1 ((Ax) i b i ) 2 min x R n Ax b 2. Now Ax R(A), so the least squares error is dist(b, R(A)) = min Ax R(A) b Ax 2 = b P R(A) b 2 with P R(A) the orthogonal projector onto R(A). fasshauer@iit.edu MATH 532 198
Orthogonal Projections fasshauer@iit.edu MATH 532 199