MATH 532: Linear Algebra

Size: px

Start display at page:

Download "MATH 532: Linear Algebra"

Aubrey Goodman
5 years ago
Views:

1 MATH 532: Linear Algebra Chapter 5: Norms, Inner Products and Orthogonality Greg Fasshauer Department of Applied Mathematics Illinois Institute of Technology Spring 2015 MATH 532 1

2 Outline 1 Vector Norms 2 Matrix Norms 3 Inner Product Spaces 4 Orthogonal Vectors 5 Gram Schmidt Orthogonalization & QR Factorization 6 Unitary and Orthogonal Matrices 7 Orthogonal Reduction 8 Complementary Subspaces 9 Orthogonal Decomposition 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 2

3 Vector Norms [0] Vector Norms 1 Vector Norms 2 Matrix Norms Definition 3 Inner Product Spaces Let x, y R n (C n ). Then 4 Orthogonal Vectors x T y = 5 Gram Schmidt Orthogonalization & QR Factorization 6 Unitary and Orthogonal Matrices 7 Orthogonal Reduction 8 Complementary Subspaces x y = n x i y i i=1 n x i y i i=1 R C is called the standard inner product for R n (C n ). 9 Orthogonal Decomposition 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH 532 4

4 Vector Norms Definition Let V be a vector space. A function : V R 0 is called a norm provided for any x, y V and α R 1 x 0 and x = 0 if and only if x = 0, 2 αx = α x, 3 x + y x + y. Remark The inequality in (3) is known as the triangle inequality. fasshauer@iit.edu MATH 532 5

5 Vector Norms Remark Any inner product, induces a norm via (more later) x = x, x. We will show that the standard inner product induces the Euclidean norm (cf. length of a vector). Remark Inner products let us define angles via cos θ = x T y x y. In particular, x, y are orthogonal if and only if x T y = 0. fasshauer@iit.edu MATH 532 6

6 Vector Norms Example Let x R n and consider the Euclidean norm x 2 = x T x ( n ) 1/2 = xi 2. We show that 2 is a norm. We do this for the real case, but the complex case goes analogously. 1 Clearly, x 2 0. Also, i=1 x 2 = 0 x 2 2 = 0 n xi 2 = 0 x i = 0, i = 1,..., n, i=1 x = 0. fasshauer@iit.edu MATH 532 7

7 Vector Norms Example (cont.) 2 We have ( n ) 1/2 ( n ) 1/2 αx 2 = (αx i ) 2 = α xi 2 = α x 2. i=1 i=1 3 To establish (3) we need Lemma Let x, y R n. Then x T y x 2 y 2. (Cauchy Schwarz Bunyakovsky) Moreover, equality holds if and only if y = αx with α = x T y x 2. 2 fasshauer@iit.edu MATH 532 8

Vector Norms Motivation for Proof of Cauchy Schwarz Bunyakovsky As already alluded to above, the angle θ between two vectors a and b is related to the inner product by cos θ = at b a b.

8 Vector Norms Motivation for Proof of Cauchy Schwarz Bunyakovsky As already alluded to above, the angle θ between two vectors a and b is related to the inner product by cos θ = at b a b. Using trigonometry as in the figure, the projection of a onto b is then a cos θ b b = a at b b a b b = at b b 2 b. Now, we let y = a and x = b, so that the projection of y onto x is given by αx, where α = x T y x 2. fasshauer@iit.edu MATH 532 9

9 Vector Norms Proof of Cauchy Schwarz Bunyakovsky We know that y αx since it s (the square of) a norm. Therefore, 0 y αx 2 2 = (y αx)t (y αx) = y T y 2αx T y + α 2 x T x = y T y 2 x ( T y x x 2 x T T y ) 2 y + x 4 x}{{} T x = x 2 2 ( x = y 2 T 2 y ) 2 x 2. 2 This implies ( x T y) 2 x 2 2 y 2 2, and the Cauchy Schwarz Bunyakovsky inequality follows by taking square roots. fasshauer@iit.edu MATH

10 Vector Norms Proof (cont.) Now we look at the equality claim. = : Let s assume that x T y = x 2 y 2. But then the first part of the proof shows that y αx 2 = 0 so that y = αx. = : Let s assume y = αx. Then x T y = x T (αx) = α x 2 2 so that we have equality. x 2 y 2 = x 2 αx 2 = α x 2 2, fasshauer@iit.edu MATH

11 Vector Norms Example (cont.) 3 Now we can show that 2 satisfies the triangle inequality: x + y 2 2 = (x + y)t (x + y) = x T x }{{} = x 2 2 +x T y + y T x }{{} =x T y = x x T y + y 2 2 x x T y + y y T y }{{} = y 2 2 CSB x x 2 y 2 + y 2 2 = ( x 2 + y 2 ) 2. Now we just need to take square roots to have the triangle inequality. fasshauer@iit.edu MATH

12 Vector Norms Lemma Let x, y R n. Then we have the backward triangle inequality x y x y. Proof We write x = x y + y tri.ineq. x y + y. But this implies x y x y. fasshauer@iit.edu MATH

13 Vector Norms Proof (cont.) Switch the roles of x and y to get y x y x ( x y ) x y. Together with the previous inequality we have x y x y. fasshauer@iit.edu MATH

14 Other common norms Vector Norms l 1 -norm (or taxi-cab norm, Manhattan norm): n x 1 = x i i=1 l -norm (or maximum norm, Chebyshev norm): x = max 1 i n x i l p -norm: ( n ) 1/p x p = x i p i=1 Remark In the homework you will use Hölder s and Minkowski s inequalities to show that the p-norm is a norm. fasshauer@iit.edu MATH

15 Vector Norms Remark We now show that x = lim p x p. Let s use tildes to mark all components of x that are maximal, i.e.. x 1 = x 2 =... = x k = max 1 i n x i. The remaining components are then x k+1,..., x n. This implies that x i x 1 < 1, for i = k + 1,..., n. fasshauer@iit.edu MATH

16 Vector Norms Remark (cont.) Now ( n ) 1/p x p = x i p i=1 = x 1 k + x k+1 x }{{ 1 } <1 p x n p x }{{} 1 <1 Since the terms inside the parentheses except for k go to 0 for p, ( ) 1/p 1 for p. And so x p x 1 = max 1 i n x i = x. 1/p. fasshauer@iit.edu MATH

, ( ) 2 2 2 = ( ) 2 2, 2 1 2 = 2 + 1 2 = 1, 2 1 2 2 ( ) 2 so that 1 2 2 2 In

17 Vector Norms Figure: Unit balls in R 2 for the l 1, l 2 and l norms. Note that B 1 B 2 B since, e.g., ( ) = ( ) 2 2, = = 1, ( ) 2 so that In fact, we have in general (similar to HW) ( x 1 x 2 x, for any x R n. ( ) = 2, 2 ) ( ) 2. 2 fasshauer@iit.edu MATH

18 Vector Norms Norm equivalence Definition Two norms and on a vector space V are called equivalent if there exist constants α, β such that Example α x x β for all x( 0) V. 1 and 2 are equivalent since from above x 1 x 2 and also x 1 n x 2 (see HW) so that Remark α = 1 x 1 x 2 n = β. In fact, all norms on finite-dimensional vector spaces are equivalent. fasshauer@iit.edu MATH

19 Matrix Norms [0] Matrix norms are special norms they will satisfy one additional property. 1 Vector Norms This property should help us measure AB for two matrices A, B of 2 Matrix Norms appropriate sizes. We 3 look Inner Product at the Spaces simplest matrix norm, the Frobenius norm, defined for A R m,n : 4 Orthogonal Vectors 1/2 ( m n m ) 1/2 5 Gram Schmidt Orthogonalization & QR Factorization A F = a ij 2 = A i Unitary and Orthogonal Matricesi=1 j=1 7 Orthogonal Reduction 8 Complementary Subspaces 9 Orthogonal Decomposition n = A j 2 2 j=1 1/2 = i=1 trace(a T A), i.e., the Frobenius norm is just a 2-norm for the vector that contains all elements 10 Singular Value of the Decomposition matrix. 11 Orthogonal Projections fasshauer@iit.edu MATH

20 Now Matrix Norms m Ax 2 2 = A i x 2 CSB i=1 m A i 2 2 x 2 2 i=1 } {{ } = A 2 F so that Ax 2 A F x 2. We can generalize this to matrices, i.e., we have AB F A F B F, which motivates us to require this submultiplicativity for any matrix norm. fasshauer@iit.edu MATH

21 Matrix Norms Definition A matrix norm is a function from the set of all real (or complex) matrices of finite size into R 0 that satisfies 1 A 0 and A = 0 if and only if A = O (a matrix of all zeros). 2 αa = α A for all α R. 3 A + B A + B (requires A, B to be of same size). 4 AB A B (requires A, B to have appropriate sizes). Remark This definition is usually too general. In addition to the Frobenius norm, most useful matrix norms are induced by a vector norm. fasshauer@iit.edu MATH

22 Matrix Norms Induced matrix norms Theorem Let (m) and (n) be vector norms on R m and R n, respectively, and let A be an m n matrix. Then A = max x (n) =1 Ax (m) is a matrix norm called the induced matrix norm. Remark Here the vector norm could be any vector norm. In particular, any p-norm. For example, we could have A 2 = max x 2,(n) =1 Ax 2,(m). To keep notation simple we often drop indices. fasshauer@iit.edu MATH

23 Matrix Norms Proof 1 A 0 is obvious since this holds for the vector norm. It remains to show that A = 0 if and only if A = O. Assume A = O, then A = max Ax x =1 }{{} = 0. So now consider A O. We need to show that A > 0. There must exist a column of A that is not 0. We call this column A k and take x = e k. Then A = max Ax ek =1 Ae k = A k > 0 x =1 since A k 0. =0 fasshauer@iit.edu MATH

24 Matrix Norms Proof (cont.) 2 Using the corresponding property for the vector norm we have αa = max αax = α max Ax = α A. 3 Also straightforward (based on the triangle inequality for the vector norm) A + B = max (A + B)x = max Ax + Bx max ( Ax + Bx ) = max Ax + max Bx = A + B. fasshauer@iit.edu MATH

25 Matrix Norms Proof (cont.) 4 First note that and so Therefore Ax max Ax = max x =1 x 0 x Ax A = max Ax = max x =1 x 0 x Ax x. But then we also have AB A B since Ax A x. (1) AB = max ABx = ABy (for some y with y = 1) x =1 (1) A By (1) A B y. }{{} =1 fasshauer@iit.edu MATH

26 Matrix Norms Remark One can show (see HW) that if A is invertible min Ax = 1 x =1 A 1. The induced matrix norm can be interpreted geometrically: A : the most a vector on the unit sphere can be stretched when transformed by A. 1 A 1 : the most a vector on the unit sphere can be shrunk when transformed by A. fasshauer@iit.edu MATH

27 Matrix 2-norm Matrix Norms Theorem Let A be an m n matrix. Then 1 A 2 = max x =1 Ax 2 = λ max. 2 A 1 2 = 1 = 1. min x =1 Ax 2 λmin where λ max and λ min are the largest and smallest eigenvalues of A T A, respectively. Remark We also have λmax = σ 1, the largest singular value of A, λmin = σ n, the smallest singular value of A. fasshauer@iit.edu MATH

28 Matrix Norms Proof We will show only (1), the largest singular value ((2) goes similarly). The idea is to solve a constrained optimization problem (as in calculus), i.e., maximize f (x) = Ax 2 2 = (Ax)T Ax subject to g(x) = x 2 2 = x T x = 1. We do this by introducing a Lagrange multiplier λ and define h(x, λ) = f (x) λg(x) = x T A T Ax λx T x. fasshauer@iit.edu MATH

29 Matrix Norms Proof (cont.) Necessary and sufficient (since quadratic) condition for maximum: h x i = 0, i = 1,..., n, g(x) = 1 ( ) x T A T Ax λx T x = x T A T Ax + x T A T A x λ x T x λx T x x i x i x i x i x i = 2e T i A T Ax 2λe T i x ) = 2 ((A T Ax) i (λx) i, i = 1,..., n. Together this yields A T Ax λx = 0 ( ) A T A λi x = 0, so that λ must be an eigenvalue of A T A (since g(x) = x T x = 1 ensures x 0). fasshauer@iit.edu MATH

30 Matrix Norms Proof (cont.) In fact, as we now show, λ is the maximal eigenvalue. First, A T Ax = λx = x T A T Ax = λx T x = λ so that And then Ax 2 = x T A T Ax = λ. A 2 = max x 2 =1 Ax 2 = max x 2 2 =1 Ax 2 = max λ = λ max. fasshauer@iit.edu MATH

31 Matrix Norms Special properties of the 2-norm 1 A 2 = max x 2 =1 max y 2 =1 y T Ax 2 A 2 = A T 2 3 A T A 2 = A 2 2 = AAT 2 ( ) 4 A O = max { A O B 2, B 2 } 5 U T AV 2 = A 2 provided UU T = I and V T V = I (orthogonal Remark matrices). The proof is a HW problem. fasshauer@iit.edu MATH

32 Matrix Norms Matrix 1-norm and -norm Theorem Let A be an m n matrix. Then we have 1 the column sum norm 2 and the row sum norm Remark A 1 = max x 1 =1 Ax 1 = max j=1,...,n A = max Ax = x =1 max i=1,...,m m a ij, i=1 n a ij. We know these are norms, so what we need to do is verify that the formulas hold. We will show (1). fasshauer@iit.edu MATH j=1

33 Matrix Norms Proof First we look at Ax 1. Ax 1 = m (Ax) i = i=1 reg. = j=1 m i=1 j=1 m A i x = i=1 n a ij x j [ ] [ n m x j a ij i=1 m i=1 max j=1,...,n n a ij x j j=1 ] m n a ij x j. Since we actually need to look at Ax 1 for x 1 = 1 we note that x 1 = n j=1 x j and therefore have Ax 1 max j=1,...,n m a ij. fasshauer@iit.edu MATH i=1 i=1 j=1

34 Matrix Norms Proof (cont.) We even have equality since for x = e k, where k is the index such that A k has maximum column sum, we get Ax 1 = Ae k 1 = A k 1 = = max j=1...,n m a ik i=1 m a ik i=1 due to our choice of k. Since e k 1 = 1 we indeed have the desired formula. fasshauer@iit.edu MATH

35 Inner Product Spaces [0] Definition A1 general Vector Norms inner product in a real (complex) vector space V is a symmetric (Hermitian) bilinear form, : V V R (C), i.e., 2 Matrix Norms 1 x, x R 0 with x, x = 0 if and only if x = 0. 3 Inner Product Spaces 2 x, αy = α x, y for all scalars α. 4 Orthogonal Vectors 3 x, y + z = x, y + x, z. 5 Gram Schmidt Orthogonalization & QR Factorization 4 x, y = y, x (or x, y = y, x if complex). 6 Unitary and Orthogonal Matrices Remark 7 Orthogonal Reduction The following two properties (providing bilinearity) are implied (see HW) 8 Complementary Subspaces 9 Orthogonal Decomposition 10 Singular Value Decomposition αx, y = α x, y x + y, z = x, z + y, z. 11 Orthogonal Projections fasshauer@iit.edu MATH

36 Inner Product Spaces As before, any inner product induces a norm via =,. One can show (analogous to the Euclidean case) that is a norm. In particular, we have a general Cauchy Schwarz Bunyakovsky inequality x, y x y. fasshauer@iit.edu MATH

37 Inner Product Spaces Example 1 x, y = x T y (or x y), the standard inner product for R n (C n ). 2 For nonsingular matrices A we get the A-inner product on R n, i.e., x, y = x T A T Ay with x A = x, x = x T A T Ax = Ax 2. 3 If V = R m n (or C m n ) then we get the standard inner product for matrices, i.e., A, B = trace(a T B) (or trace(a B)) with A = A, A = trace(a T A) = A F. fasshauer@iit.edu MATH

38 Inner Product Spaces Remark In the infinite-dimensional setting we have, e.g., for f, g continuous functions on (a, b) b f, g = f (t)g(t)dt a with f = ( 1/2 b f, f = (f (t)) dt) 2. a fasshauer@iit.edu MATH

39 Parallelogram identity Inner Product Spaces In any inner product space the so-called parallelogram identity holds, i.e., ( x + y 2 + x y 2 = 2 x 2 + y 2). (2) This is true since x + y 2 + x y 2 = x + y, x + y + x y, x y = x, x + x, y + y, x + y, y + x, x x, y y, x + y, y ( = 2 x, x + 2 y, y = 2 x 2 + y 2). fasshauer@iit.edu MATH

40 Inner Product Spaces Polarization identity The following theorem shows that we not only get a norm from an inner product (i.e., every Hilbert space is a Banach space), but if the parallelogram identity holds then we can get an inner product from a norm (i.e., a Banach space becomes a Hilbert space). Theorem Let V be a real vector space with norm. If the parallelogram identity (2) holds then x, y = 1 4 is an inner product on V. ( x + y 2 x y 2) (3) fasshauer@iit.edu MATH

41 Inner Product Spaces Proof We need to show that all four properties of a general inner product hold. 1 Nonnegativity: x, x = 1 4 ( x + x 2 x x 2) = 1 4 2x 2 = x 2 0. Moreover, x, x > 0 if and only if x = 0 since x, x = x 2. 4 Symmetry: x, y = y, x is clear since x y = y x. fasshauer@iit.edu MATH

42 Inner Product Spaces Proof (cont.) 3 Additivity: The parallelogram identity implies and x + y 2 + x + z 2 = 1 2 x y 2 + x z 2 = 1 2 Subtracting (5) from (4) we get ( x + y + x + z 2 + y z 2). (4) ( x y + x z 2 + z y 2). (5) x + y 2 x y 2 + x + z 2 x z 2 = 1 ( 2x + y + z 2 2x y z 2). 2 (6) fasshauer@iit.edu MATH

43 Inner Product Spaces Proof (cont.) The specific form of the polarized inner product implies x, y + x, z = 1 ( x + y 2 x y 2 + x + z 2 x z 2) 4 (6) = 1 ( 2x + y + z 2 2x y z 2) 8 ( = 1 2 x + y + z 2 2 x y + z ) 2 2 polarization Setting z = 0 in (7) yields since x, z = 0. = 2 x, y + z. (7) 2 x, y = 2 x, y 2 (8) fasshauer@iit.edu MATH

44 Inner Product Spaces Proof (cont.) To summarize, we have x, y + x, z = 2 x, y + z. (7) 2 and x, y = 2 x, y. (8) 2 Since (8) is true for any y V we can, in particular, set y = y + z so that we have x, y + z = 2 x, y + z 2. This, however, is the right-hand side of (7) so that we end up with as desired. x, y + z = x, y + x, z, fasshauer@iit.edu MATH

45 Inner Product Spaces Proof (cont.) 2 Scalar multiplication: To show x, αy = α x, y for integer α we can just repeatedly apply the additivity property just proved. From this we can get the property for rational α as follows. We let α = β γ with integer β, γ 0 so that βγ x, y = γx, βy = γ 2 x, β γ y. Dividing by γ 2 we get β γ x, y = x, β γ y. fasshauer@iit.edu MATH

46 Inner Product Spaces Proof (cont.) Finally, for real α we use the continuity of the norm function (see HW) which implies that our inner product, also is continuous. Now we take a sequence {α n } of rational numbers such that α n α for n and have by continuity x, α n y x, αy as n. fasshauer@iit.edu MATH

47 Inner Product Spaces Theorem The only vector p-norm induced by an inner product is the 2-norm. Remark Since many problems are more easily dealt with in inner product spaces (since we then have lengths and angles, see next section) the 2-norm has a clear advantage over other p-norms. fasshauer@iit.edu MATH

48 Inner Product Spaces Proof We know that the 2-norm does induce an inner product, i.e., x 2 = x T x. Therefore we need to show that it doesn t work for p 2. We do this by showing that the parallelogram identity x + y 2 + x y 2 = 2 ( x 2 + y 2) fails for p 2. We will do this for 1 p <. You will work out the case p = in a HW problem. fasshauer@iit.edu MATH

49 Inner Product Spaces Proof (cont.) All we need is a counterexample, so we take x = e 1 and y = e 2 so that ( n ) 2/p x + y 2 p = e 1 + e 2 2 p = [e 1 + e 2 ] i p = 2 2/p i=1 and, similarly x y 2 p = e 1 e 2 2 p = 2 2/p. Together, the left-hand side of the parallelogram identity is 2 ( 2 2/p) = 2 2/p+1. fasshauer@iit.edu MATH

50 Inner Product Spaces Proof (cont.) For the right-hand side of the parallelogram identity we calculate x 2 p = e 1 2 p = 1 = e 2 2 p = y 2 p, so that the right-hand side comes out to 4. Finally, we have 2 2/p+1 = 4 2 p + 1 = 2 2 p = 1 or p = 2. fasshauer@iit.edu MATH

51 Orthogonal Vectors [0] Orthogonal Vectors 1 Vector Norms 2 Matrix Norms We will now work in a general inner product space V with induced norm 3 Inner Product Spaces =,. 4 Orthogonal Vectors 5 Gram Schmidt Orthogonalization & QR Factorization Definition 6 Unitary and Orthogonal Matrices Two vectors x, y V are called orthogonal if 7 Orthogonal Reduction 8 Complementary Subspaces We often use the notation x y. 9 Orthogonal Decomposition x, y = Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH

52 Orthogonal Vectors In the HW you will prove the Pythagorean theorem for the 2-norm and standard inner product x T y, i.e., x 2 + y 2 = x y 2 x T y = 0. Moreover, the law of cosines states so that x y 2 = x 2 + y 2 2 x y cos θ, cos θ = x 2 + y 2 x y 2 2 x y Pythagoras = 2x T y 2 x y. This motivates our general definition of angles: Definition Let x, y V. The angle between x and y is defined via cos θ = x, y x y, θ [0, π]. fasshauer@iit.edu MATH

53 Orthogonal Vectors Orthonormal sets Definition A set {u 1, u 2,..., u n } V is called orthonormal if u i, u j = δ ij (Kronecker delta). Theorem Every orthonormal set is linearly independent. Corollary Every orthonormal set of n vectors from an n-dimensional vector space V is an orthonormal basis for V. fasshauer@iit.edu MATH

54 Orthogonal Vectors Proof (of the theorem) We want to show linear independence, i.e., that n α j u j = 0 = α j = 0, j = 1,..., n. j=1 To see this is true we take the inner product with u i : u i, n j=1 n α j u j = u i, 0 j=1 α j u i, u j }{{} = 0 α i = 0. =δ ij Since i was arbitrary this holds for all i = 1,..., n, and we have linear independence. fasshauer@iit.edu MATH

55 Orthogonal Vectors Example The standard orthonormal basis of R n is given by {e 1, e 2,..., e n }. Using this basis we can express any x R n as x = x 1 e 1 + x 2 e x n e n, we get a coordinate expansion of x. fasshauer@iit.edu MATH

56 Orthogonal Vectors In fact, any other orthonormal basis provides just as simple a representation of x; Consider the orthonormal basis B = {u 1, u 2,..., u n } and assume x = n α j u j j=1 for some appropriate scalars α j. To find these expansion coefficients α j we take the inner product with u i, i.e., u i, x = n j=1 α j u i, u j }{{} =δ ij = α i. fasshauer@iit.edu MATH

57 Orthogonal Vectors We therefore have proved Theorem Let {u 1, u 2,..., u n } be an orthonormal basis for an inner product space V. Then any x V can be written as x = n x, u i u i. j=1 This is a (finite) Fourier expansion with Fourier coefficients x, u i. fasshauer@iit.edu MATH

58 Orthogonal Vectors Remark The classical (infinite-dimensional) Fourier series for continuous functions on ( π, π) uses the orthogonal (but not yet orthonormal) basis {1, sin t, cos t, sin 2t, cos 2t,..., } and the inner product f, g = π π f (t)g(t)dt. fasshauer@iit.edu MATH

59 Orthogonal Vectors Example Consider the basis B = {u 1, u 2, u 3 } = 0, 1, It is clear by inspection that B is an orthogonal subset of R 3, i.e., using the Euclidean inner product, we have u T i u j = 0, i, j = 1, 2, 3, i j. We can obtain an orthonormal basis by normalizing the vectors, i.e., by computing v i = u i u i 2, i = 1, 2, 3. This yields v 1 = , v 2 = 1, v 3 = fasshauer@iit.edu MATH

60 Orthogonal Vectors Example (cont.) The Fourier expansion of x = ( ) T is given by x = 3 i=1 (x T v i ) v i = = fasshauer@iit.edu MATH

61 Gram Schmidt Orthogonalization & QR Factorization [0] 1 Vector Norms 2 Matrix Norms We 3 want to convert an arbitrary basis {x 1, x 2,..., x n } of V to an Inner Product Spaces orthonormal basis {u 1, u 2,..., u n }. 4 Orthogonal Vectors Idea: construct u 1, u 2,..., u n successively so that {u 1, u 2,..., u k } is an ON basis for span{x 1, x 2,..., x k }, k = 1,..., n. 5 Gram Schmidt Orthogonalization & QR Factorization 6 Unitary and Orthogonal Matrices 7 Orthogonal Reduction 8 Complementary Subspaces 9 Orthogonal Decomposition 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH

62 Gram Schmidt Orthogonalization & QR Factorization Construction k = 1: u 1 = x 1 x 1. k = 2: Consider the projection of x 2 onto u 1, i.e., u 1, x 2 u 1. Then and v 2 = x 2 u 1, x 2 u 1 u 2 = v 2 v 2. fasshauer@iit.edu MATH

63 Gram Schmidt Orthogonalization & QR Factorization In general, consider {u 1,..., u k } as a given ON basis for span{x 1,..., x k }. Use the Fourier expansion to express x k+1 with respect to {u 1,..., u k+1 }: k+1 x k+1 = u i, x k+1 u i x k+1 = i=1 k u i, x k+1 u i + u k+1, x k+1 u k+1 i=1 u k+1 = x k+1 k i=1 u i, x k+1 u i u k+1, x k+1 This vector, however is not yet normalized. = v k+1 u k+1, x k+1 fasshauer@iit.edu MATH

64 Gram Schmidt Orthogonalization & QR Factorization We now want u k+1 = 1, i.e., v k+1 u k+1, x k+1, v k+1 u k+1, x k+1 = 1 u k+1, x k+1 v k+1 = 1 = v k+1 = x k+1 Therefore k u i, x k+1 u i = u k+1, x k+1. i=1 u k+1, x k+1 = ± x k+1 k u i, x k+1 u i. Since the factor ±1 does not change the span, nor orthogonality, nor normalization we can pick the positive sign. i=1 fasshauer@iit.edu MATH

65 Gram Schmidt Orthogonalization & QR Factorization Gram Schmidt algorithm Summarizing, we have u 1 = x 1 x 1, k 1 v k = x k u i, x k u i, k = 2,..., n, u k = v k v k. i=1 fasshauer@iit.edu MATH

66 Gram Schmidt Orthogonalization & QR Factorization Using matrix notation to describe Gram Schmidt We will assume V R m (but this also works in the complex case). Let 0 U 1 =. R m 0 and for k = 2, 3,..., n let U k = ( u 1 u 2 u k 1 ) R m k 1. fasshauer@iit.edu MATH

67 Gram Schmidt Orthogonalization & QR Factorization Then and u T 1 x k U T k x u T 2 k = x k. u T k 1 x k u T 1 U k U T k x k = ( ) x k u T 2 u 1 u 2 u k 1 x k. u T k 1 x k k 1 k 1 = u i (u T i x k ) = (u T i x k )u i. i=1 i=1 fasshauer@iit.edu MATH

68 Gram Schmidt Orthogonalization & QR Factorization Now, Gram Schmidt says k 1 v k = x k (u T i x k )u i = x k U k U T k x k = i=1 ( I U k U T k ) x k, k = 1, 2,..., n, where the case k = 1 is also covered by the special definition of U 1. Remark U k U T k is a projection matrix, and I U ku T k projection. We will cover these later. is a complementary fasshauer@iit.edu MATH

69 Gram Schmidt Orthogonalization & QR Factorization QR Factorization (via Gram Schmidt) Consider an m n matrix A with rank(a) = n. We want to convert the set of columns of A, {a 1, a 2,..., a n } to an ON basis {q 1, q 2,..., q n } of R(A). From our discussion of Gram Schmidt we know q 1 = a 1 a 1, k 1 v k = a k q i, a k q i, k = 2,..., n, q k = v k v k. i=1 fasshauer@iit.edu MATH

70 Gram Schmidt Orthogonalization & QR Factorization We now rewrite as follows: a 1 = a 1 q 1 a k = q 1, a 2 q q k 1, a k q k 1 + v k q k, k = 2,..., n. We also introduce the new notation Then r 11 = a 1, r kk = v k, k = 2,..., n. A = ( ) a 1 a 2 a n r 11 q 1, a 2 q 1, a n = ( ) r 22 q 2, a n q 1 q 2 q n }{{}.... =Q } O {{ r nn } =R and we have the reduced QR factorization of A. fasshauer@iit.edu MATH

71 Gram Schmidt Orthogonalization & QR Factorization Remark The matrix Q is m n with orthonormal columns The matrix R is n n upper triangular with positive diagonal entries. The reduced QR factorization is unique (see HW). fasshauer@iit.edu MATH

72 Gram Schmidt Orthogonalization & QR Factorization Example Find the QR factorization of the matrix A = q 1 = a 1 a 1 = 1 ( v 2 = a 2 1 0, r 11 = a 1 = ) q T 1 a = = = q 2 = v 2 v 2 = q 1, q T 1 a 2 = 2 = 2 = r , v 2 = 3 = r 22 fasshauer@iit.edu MATH

73 Gram Schmidt Orthogonalization & QR Factorization Example (cont.) ( ) ( ) v 3 = a 3 q T 1 a 3 q 1 q T 2 a 3 q 2 with q T 1 a 3 = 1 2 = r 13, q T 2 a 3 = 0 = r 23 Thus 0 v 3 = = 1 1 2, v 3 = = r 33 So q 3 = v 3 v 3 = fasshauer@iit.edu MATH

74 Gram Schmidt Orthogonalization & QR Factorization Example (cont.) Together we have Q = , R = fasshauer@iit.edu MATH

75 Gram Schmidt Orthogonalization & QR Factorization Solving linear systems with the QR factorization Recall the use of the LU factorization to solve Ax = b. Now, A = QR implies Ax = b QRx = b. In the special case of a nonsingular n n matrix A the matrix Q is also n n with ON columns so that Q 1 = Q T (since Q T Q = I) and QRx = b Rx = Q T b. fasshauer@iit.edu MATH

76 Gram Schmidt Orthogonalization & QR Factorization Therefore we solve Ax = b by the following steps: 1 Compute A = QR. 2 Compute y = Q T b. 3 Solve the upper triangular system Rx = y. Remark This procedure is comparable to the three-step LU solution procedure. fasshauer@iit.edu MATH

77 Gram Schmidt Orthogonalization & QR Factorization The real advantage of the QR factorization lies in the solution of least squares problems. Consider Ax = b with A R m n and rank(a) = n (so that a unique least squares solution exists). We know that the least squares solution is given by the solution of the normal equations A T Ax = A T b. Using the QR factorization of A this becomes (QR) T QRx = (QR) T b R T Q }{{ T Q } Rx = R T Q T b =I R T Rx = R T Q T b. Now R is upper triangular with positive diagonal and therefore invertible. Therefore solving the normal equations corresponds to solving (cf. the previous discussion) Rx = Q T b. fasshauer@iit.edu MATH

78 Gram Schmidt Orthogonalization & QR Factorization Remark This is the same as the QR factorization applied to a square and consistent system Ax = b. Summary The QR factorization provides a simple and efficient way to solve least squares problems. The ill-conditioned matrix A T A is never computed. If it is required, then it can be computed from R as R T R (in fact, this is the Cholesky factorization) of A T A. fasshauer@iit.edu MATH

79 Gram Schmidt Orthogonalization & QR Factorization Modified Gram Schmidt There is still a problem with the QR factorization via Gram Schmidt: it is not numerically stable (see HW). A better but still not ideal approach is provided by the modified Gram Schmidt algorithm. Idea: rearrange the order of calculation, i.e., write the projection matrices k 1 U k U T k = u i u T i i=1 as a sum of rank-1 projections. fasshauer@iit.edu MATH

80 Gram Schmidt Orthogonalization & QR Factorization MGS Algorithm k=1: u 1 x 1 x 1, u j x j, j = 2,..., n for k = 2 : n E k = I u k 1 u T k 1 for j = k,..., n u j E k u j u k u k u k fasshauer@iit.edu MATH

81 Gram Schmidt Orthogonalization & QR Factorization Remark The MGS algorithm is theoretically equivalent to the GS algorithm, i.e., in exact arithmetic, but in practice it preserves orthogonality better. Most stable implementations of the QR factorization use Householder reflections or Givens rotations (more later). Householder reflections are also more efficient than MGS. MATH

82 Unitary and Orthogonal Matrices Unitary and Orthogonal Matrices [0] 1 Vector Norms Definition 2 Matrix Norms A real (complex) n n matrix is called orthogonal (unitary) if its columns form an orthonormal basis for R n (C n ). 3 Inner Product Spaces 4 Orthogonal Vectors Theorem Let 5 UGram Schmidt be an orthogonal Orthogonalization & n QR Factorization n matrix. Then 1 U has orthonormal rows. 6 Unitary and Orthogonal Matrices 2 U 1 = U T. 7 Orthogonal Reduction 3 Ux 2 = x 2 for all x R n, i.e., U is an isometry. 8 Complementary Subspaces Remark 9 Orthogonal Decomposition Analogous properties for unitary matrices are formulated and proved in 10 [Mey00]. Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH

83 Unitary and Orthogonal Matrices Proof 2 By definition U = ( u 1 u n ) has orthonormal columns, i.e., But U T U = I implies U T = U 1. u i u j u T i u j = δ ij ( ) U T U = δ ij U T U = I. 1 Therefore the statement about orthonormal rows follows from UU 1 = UU T = I. ij fasshauer@iit.edu MATH

84 Unitary and Orthogonal Matrices Proof (cont.) 3 To show that U is an isometry we assume U is orthogonal. Then, for any x R n Ux 2 2 = (Ux)T (Ux) = x T U }{{} T U x =I = x 2 2. fasshauer@iit.edu MATH

85 Unitary and Orthogonal Matrices Remark The converse of (3) is also true, i.e., if Ux 2 = x 2 for all x R n then U must be orthogonal. Consider x = e i. Then Ue i 2 2 = ut i u i (3) = e i 2 2 = 1, so the columns of U have norm 1. Moreover, for x = e i + e j (i j) we get U(e i + e j ) 2 2 = ut i u }{{} i =1 +u T i u j + u T j u i + u T j u j }{{} =1 (3) = e i + e j 2 2 = 2, so that u T i u j = 0 for i j and the columns of U are orthogonal. fasshauer@iit.edu MATH

86 Unitary and Orthogonal Matrices Example The simplest orthogonal matrix is the identity matrix I. Permutation matrices are orthogonal, e.g., P = In fact, for permutation matrices we even have P T = P so that P T P = P 2 = I. Such matrices are called involutary (see pretest). An orthogonal matrix can be viewed as a unitary matrix, but a unitary matrix may not be orthogonal. For example for A = 1 ( ) 1 i 2 1 i we have A A = AA = I, but A T A I AA T. fasshauer@iit.edu MATH

87 Unitary and Orthogonal Matrices Elementary Orthogonal Projectors Definition A matrix Q of the form Q = I uu T, u R n, u 2 = 1, is called an elementary orthogonal projection. Remark Note that Q is not an orthogonal matrix: Q T = (I uu T ) T = I uu T = Q. fasshauer@iit.edu MATH

88 Unitary and Orthogonal Matrices All projectors are idempotent, i.e., Q 2 = Q: Q T Q above = Q 2 = (I uu T )(I uu T ) = I 2uu T + u u }{{} T u =1 = (I uu T ) = Q. u T fasshauer@iit.edu MATH

89 Unitary and Orthogonal Matrices Geometric interpretation Consider x = (I Q)x + Qx and observe that (I Q)x Qx: ((I Q)x) T Qx = x T (I Q T )Qx Also, = x T (Q Q }{{ T Q } )x = 0. =Q (I Q)x = (uu T )x = u(u T x) span{u}. Therefore Qx u, the orthogonal complement of u. Also note that (u T x)u = u T x u }{{ 2, so that u T x is the length of } =1 the orthogonal projection of x onto span{u}. fasshauer@iit.edu MATH

90 Unitary and Orthogonal Matrices Summary (I Q)x span{u}, so I Q = uu T = P u is a projection onto span{u}. Qx u, so Q = I uu T = P u is a projection onto u. fasshauer@iit.edu MATH

91 Unitary and Orthogonal Matrices Remark Above we assumed that u 2 = 1. For an arbitrary vector v we get a unit vector u = Therefore, for general v P v = vvt v T v is a projection onto span{v}. P v = I P v = I vvt v T v is a projection onto v. v v 2 = v. v T v fasshauer@iit.edu MATH

92 Unitary and Orthogonal Matrices Elementary Reflections Definition Let v( 0) R n. Then R = I 2 vvt v T v is called the elementary (or Householder) reflector about v. Remark For u R n with u 2 = 1 we have R = I 2uu T. fasshauer@iit.edu MATH

Unitary and Orthogonal Matrices Geometric interpretation Consider u 2 = 1, and note that Qx = (I uu T )x is the orthogonal projection of x onto u as above.

93 Unitary and Orthogonal Matrices Geometric interpretation Consider u 2 = 1, and note that Qx = (I uu T )x is the orthogonal projection of x onto u as above. Also, Q(Rx) = Q(I 2uu T )x = Q (I 2(I Q)) x = (Q 2Q + 2 }{{} Q 2 )x = Qx, =Q so that Qx is also the orthogonal projection of Rx onto u. Moreover, x Qx = x (I uu T )x = u T x u = u T x and Qx Rx = (Q R)x = ( I uu T (I 2uu T ) ) x = uu T x = u T x. Together, Rx is the reflection of x about u. fasshauer@iit.edu MATH

94 Unitary and Orthogonal Matrices Properties of elementary reflections Theorem Let R be an elementary reflector. Then R 1 = R T = R, i.e., R is orthogonal, symmetric, and involutary. Remark However, these properties do not characterize a reflection, i.e., an orthogonal, symmetric and involutary matrix is not necessarily a reflection (see HW). fasshauer@iit.edu MATH

95 Unitary and Orthogonal Matrices Proof. R T = (I 2uu T ) T = I 2uu T = R. Also, so that R 1 = R. R 2 = (I 2uu T )(I 2uu T ) = I 4uu T + 4u u }{{} T u u T = I, =1 fasshauer@iit.edu MATH

96 Unitary and Orthogonal Matrices Reflection of x onto e 1 If we can construct a matrix R such that Rx = αe 1, then we can use R to zero out entries in (the first column of) a matrix. To this end consider { 1 if x1 real, and note v = x ± µ x 2 e 1, where µ = x 1 x 1 if x 1 complex, v T v = (x ± µ x 2 e 1 ) T (x ± µ x 2 e 1 ) = x T x ± 2µ x 2 e T 1 x + }{{} µ2 x 2 2 =1 = 2(x T x ± µ x e T 1 x) = 2v T x. (9) fasshauer@iit.edu MATH

97 Unitary and Orthogonal Matrices Our Householder reflection was defined as so that R = I 2 vvt v T v Rx = x 2 vvt x v T v = x v = µ x }{{ 2 e } 1. =α = x 2v T x v T v }{{ v} (9) =1 Remark These special reflections are used in the Householder variant of the QR factorization. For optimal numerical stability of real matrices one lets µ = sign(x 1 ). fasshauer@iit.edu MATH

98 Unitary and Orthogonal Matrices Remark Since R 2 = I (R 1 = R) we have whenever x 2 = 1 Rx = µe 1 = R 2 x = µre 1 x = µr 1. Therefore the matrix U = R (taking µ = 1) is orthogonal (since R is) and contains x as its first column. Thus, this allows us to construct an ON basis for R n that contains x (see example in [Mey00]. fasshauer@iit.edu MATH

99 Rotations Unitary and Orthogonal Matrices We give only a brief overview (more details can be found in [Mey00]). We begin in R 2 and look for a matrix representation of the rotation of a vector u into another vector v, counterclockwise by an angle θ: Here u = v = ( u1 u 2 ( v1 v 2 ) ( ) u cos φ = u sin φ ) ( v cos(φ + θ) = v sin(φ + θ) (10) ) (11) fasshauer@iit.edu MATH

100 Unitary and Orthogonal Matrices We use the trigonometric identities cos(a + B) = cos A cos B sin A sin B sin(a + B) = sin A cos B + sin B cos A with A = φ, B = θ and v = u to get ( ) v (11) v cos(φ + θ) = v sin(φ + θ) ( ) u (cos φ cos θ sin φ sin θ) = u (sin φ cos θ + sin θ cos φ) ( ) (10) u1 cos θ u = 2 sin θ u 2 cos θ + u 1 sin θ ( ) cos θ sin θ = u = Pu, sin θ cos θ where P is the rotation matrix. fasshauer@iit.edu MATH

101 Unitary and Orthogonal Matrices Remark Note that ( ) ( ) P T cos θ sin θ cos θ sin θ P = sin θ cos θ sin θ cos θ ( cos = 2 θ + sin 2 ) θ cos θ sin θ + cos θ sin θ cos θ sin θ + cos θ sin θ sin 2 θ + cos 2 θ = I, so that P is an orthogonal matrix. P T is also a rotation matrix (by an angle θ). fasshauer@iit.edu MATH

102 Unitary and Orthogonal Matrices Rotations about a coordinate axis in R 3 are very similar. Such rotations are referred to a plane rotations. For example, rotation about the x-axis (in the yz-plane) is accomplished with P yz = 0 cos θ sin θ 0 sin θ cos θ Rotation about the y and z-axes is done analogously. fasshauer@iit.edu MATH

103 Unitary and Orthogonal Matrices We can use the same ideas for plane rotations in higher dimensions. Definition An orthogonal matrix of the form 1 P ij =... c s 1... s c 1... i j 1 i j with c 2 + s 2 = 1 is called a plane rotation (or Givens rotation). Note that the orientation is reversed from the earlier discussion. fasshauer@iit.edu MATH

104 Unitary and Orthogonal Matrices Usually we set c = x i, s = xi 2 + xj 2 since then for x = ( x 1 x n ) T x 1. cx i + sx j P ij x =. = sx i + cx j.. x n x j xi 2 + xj 2 x 1. x 2 i +x 2 j x 2 +x i j x n This shows that P ij zeros the j th component of x. fasshauer@iit.edu MATH

105 Unitary and Orthogonal Matrices Note that xi 2 +xj 2 xi 2 +xj 2 = xi 2 + xj 2 so that repeatedly applying Givens rotations P ij with the same i, but different values of j will zero out all but the i th component of x, and that component will become x x 2 n = x 2. Therefore, the sequence P = P in P i,i+1 P i,i 1 P i1 of Givens rotations rotates the vector x R n onto e i, i.e., Px = x 2 e i. Moreover, the matrix P is orthogonal. fasshauer@iit.edu MATH

106 Unitary and Orthogonal Matrices Remark Givens rotations can be used as an alternative to Householder reflections to construct a QR factorization. Householder reflections are in general more efficient, but for sparse matrices Givens rotations are more efficient because they can be applied more selectively. fasshauer@iit.edu MATH

107 Orthogonal Reduction Orthogonal Reduction [0] 1 Vector Norms 2 Matrix Norms Recall the form of LU factorization (Gaussian elimination): 3 Inner Product Spaces 4 Orthogonal Vectors T n 1 T 2 T 1 A = U, where 5 T k are lower triangular and U is upper triangular, i.e., we have a Gram Schmidt Orthogonalization & QR Factorization triangular reduction. 6 Unitary and Orthogonal Matrices For the QR factorization we will use orthogonal Householder reflectors 7 Orthogonal Reduction R k to get R n 1 R 2 R 1 A = T, 8 Complementary Subspaces where T is upper triangular, i.e., we have an orthogonal reduction. 9 Orthogonal Decomposition 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH

108 Orthogonal Reduction Recall Householder reflectors so that and µ = 1 for x real. R = I 2 vvt v T v, with v = x ± µ x e 1, Rx = µ x e 1 Now we explain how to use these Householder reflectors to convert an m n matrix A to an upper triangular matrix of the same size, i.e., how to do a full QR factorization. fasshauer@iit.edu MATH

109 Orthogonal Reduction Apply Householder reflector to the first column of A: R 1 A 1 = ) (I 2 vvt v T A 1 with v = A 1 ± A 1 e 1 v t 11 0 = A 1 e 1 =. 0 Then, R 1 applied to all of A yields t 11 t 12 t 1n 0 R 1 A =... = 0 ( t11 t T ) 1 0 A 2 fasshauer@iit.edu MATH

110 Orthogonal Reduction Next, we apply the same idea to A 2, i.e., we let ( 1 0 T ) R 2 = 0 ˆR 2 Then R 2 R 1 A = ( t11 t T ) 1 = 0 ˆR2 A 2 t 11 t 12 t 1n t 22 t T A 3 fasshauer@iit.edu MATH

111 Orthogonal Reduction We continue the process until we get an upper triangular matrix, i.e., t R n R 2 R }{{ 1 A =. } O t nn whenever m > n =P O or t 11 R m R 2 R }{{ 1 A = } =P O.... t mm whenever n > m Since each R k is orthogonal (unitary for complex A) we have PA = T with P m m orthogonal and T m n upper triangular, i.e., A = QR (Q = P T, R = T) fasshauer@iit.edu MATH

112 Orthogonal Reduction Remark This is similar to obtaining the QR factorization via MGS, but now Q is orthogonal (square) and R is rectangular. This gives us the full QR factorization, whereas MGS gave us the reduced QR factorization (with m n Q and n n R). fasshauer@iit.edu MATH

113 Orthogonal Reduction Example We use Householder reflections to find the QR factorization (where R has positive diagonal elements) of A = so that R 1 = I 2 v 1v T 1 v T 1 v, with v 1 = A 1 ± A 1 e R 1 A = A 1 e 1 = 0. 0 Thus we take the ± sign as so that t 11 = 2 > 0. fasshauer@iit.edu MATH

114 Orthogonal Reduction Example ((cont.)) To find R 1 A we can either compute R 1 using the formula above and then compute the matrix-matrix product, or more cheaply note that ( ) R 1 x = I 2 v 1v T 1 v T 1 v 1 x = x 2v T 1 x v 1 v T 1 v, 1 so that we can compute v T 1 A j, j = 2, 3, instead of the full R 1. v T 1 A 2 = (1 2) = v T 1 A 3 = (1 2) = 1 Also 2 v 1 v T 1 v 1 = fasshauer@iit.edu MATH

115 Orthogonal Reduction Example ((cont.)) Therefore 2 R 1 A 2 = = }{{ 2 } 1 R 1 A 3 = = 2 ( ) so that R 1 A = 0 1 1, A 2 = ( ) fasshauer@iit.edu MATH

116 Orthogonal Reduction Example ((cont.)) Next ˆR 2 x = x 2v T 2 x v 2 v T 2 v 2 with v 2 = (A 2 ) 1 (A 2 ) 1 e 1 = v T 2 (A 2) 1 = 3 3, v T 2 (A 2) 2 = 3, 2 v 2 v T 2 v 2 so Using R 2 = 1 = 3 3 ( ) ( ) 3 0 ˆR 2 (A 2 ) 1 =, ˆR2 (A 0 2 ) 2 = 6 2 ( 1 0 T ) 0 ˆR we get R 2 R }{{ 1 A = } =P = T ( ) ( ) fasshauer@iit.edu MATH

117 Orthogonal Reduction Remark As mentioned earlier, the factor R of the QR factorization is given by the matrix T. The factor Q = P T is not explicitly given in the example. One could also obtain the same answer using Givens rotations (compare [Mey00, Example 5.7.2]). fasshauer@iit.edu MATH

118 Orthogonal Reduction Theorem Let A be an n n nonsingular real matrix. Then the factorization A = QR with n n orthogonal matrix Q and n n upper triangular matrix R with positive diagonal entries is unique. Remark In this n n case the reduced and full QR factorizations coincide, i.e., the results obtained via Gram Schmidt, Householder and Givens should be identical. fasshauer@iit.edu MATH

119 Orthogonal Reduction Proof Assume we have two QR factorizations A = Q 1 R 1 = Q 2 R 2 Q T 2 Q 1 = R 2 R 1 1 = U. Now, R 2 R 1 1 is upper triangular with positive diagonal (since each factor is) and Q T 2 Q 1 is orthogonal. Therefore U has all of these properties. Since U is upper triangular u 11 0 U 1 =.. 0 Moreover, since U is orthogonal u 11 = 1. fasshauer@iit.edu MATH

120 Orthogonal Reduction Proof (cont.) Next, u 12 u 22 U T 1 U 2 = ( ) 0. 0 = u 12 = 0 since the columns of U are orthogonal, and the fact that U 2 = 1 implies u 22 = 1. Comparing all the other pairs of columns of U shows that U = I, and therefore Q 1 = Q 2 and R 1 = R 2. fasshauer@iit.edu MATH

121 Orthogonal Reduction Recommendations (so far) for solution of Ax = b 1 If A is square and nonsingular, then use LU factorization with partial pivoting. This is stable for most practical problems and requires O( n3 3 ) operations. 2 To find a least square solution, use QR factorization: Ax = b QRx = b Rx = Q T b. Usually the reduced QR factorization is all that s needed. fasshauer@iit.edu MATH

122 Orthogonal Reduction Even though (for square nonsingular A) the Gram Schmidt, Householder and Givens versions of the QR factorization are equivalent (due to the uniqueness theorem), we have for general A that classical GS is not stable, modified GS is stable for least squares, but unstable for QR (since it has problems maintaining orthogonality), Householder and Givens are stable, both for least squares and QR fasshauer@iit.edu MATH

123 Orthogonal Reduction Computational cost (for n n matrices) LU with partial pivoting: O( n3 3 ) Gram Schmidt: O(n 3 ) Householder: O( 2n3 3 ) Givens: O( 4n3 3 ) Householder reflections are often the preferred method since they provide both stability and also decent efficiency. fasshauer@iit.edu MATH

124 Complementary Subspaces Complementary Subspaces [0] Definition 1 Vector Norms Let V be a vector space and X, Y V be subspaces. X and Y are 2 Matrix Norms called complementary provided 3 Inner Product Spaces 4 Orthogonal Vectors V = X + Y and X Y = {0}. In this case, V is also called the direct sum of X and Y, and we write 5 Gram Schmidt Orthogonalization & QR Factorization 6 Unitary and Orthogonal Matrices Example 7 Orthogonal Reduction V = X Y. Any two lines through the origin in R 2 are complementary. 8 Complementary Subspaces Any plane through the origin in R 3 is complementary to any line through the origin not contained in the plane. 9 Orthogonal Decomposition Two planes through the origin in R 3 are not complementary since they must intersect in a line. 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH

125 Complementary Subspaces Theorem Let V be a vector space, and X, Y V be subspaces with bases B X and B Y. The following are equivalent: 1 V = X Y. 2 For every v V there exist unique x X and y Y such that v = x + y. 3 B X B Y = {} and B X B Y is a basis for V. Proof. See [Mey00]. Definition Suppose V = X Y, i.e., any v V can be uniquely decomposed as v = x + y. Then 1 x is called the projection of v onto X along Y. 2 y is called the projection of v onto Y along X. fasshauer@iit.edu MATH

126 Complementary Subspaces Properties of projectors Theorem Let X, Y be complementary subspaces of V. Let P, defined by Pv = x, be the projector onto X along Y. Then 1 P is unique. 2 P 2 = P, i.e., P is idempotent. 3 I P is the complementary projector (onto Y along X ). 4 R(P) = {x : Px = x} = X ( fixed points for P). 5 N(I P) = X = R(P) and R(I P) = N(P) = Y. 6 If V = R n (or C n ), then P = ( X O ) ( X Y ) 1 = ( X Y ) ( ) I O (X ) 1 Y, O O where the columns of X and Y are bases for X and Y. fasshauer@iit.edu MATH

127 Complementary Subspaces Proof 1 Assume P 1 v = x = P 2 v for all v V. But then P 1 = P 2. 2 We know Pv = x for every v V so that P 2 v = P(Pv) = Px = x. Together we therefore have P 2 = P. 3 Using the unique decomposition of v we can write v = x + y = Pv + y (I P)v = y, the projection of v onto Y along X. fasshauer@iit.edu MATH

128 Complementary Subspaces Proof (cont.) 4 Note that x R(P) if and only if x = Px. This is true since if x = Px then x obviously in R(P). On the other hand, if x R(P) then x = Pv for some v V and so Therefore Px = P 2 v (2) = Pv = x. R(P) = {x : x = Pv, v V} = X = {x : Px = x}. 5 Since N(I P) = {x : (I P)x = 0}, and (I P)x = 0 Px = x we have N(I P) = X = R(P). The claim R(I P) = Y = N(P) is shown similarly. fasshauer@iit.edu MATH

129 Complementary Subspaces Proof (cont.) 6 Take B = ( X Y ), where the columns of X and Y form a basis for X and Y, respectively. Then the columns of B form a basis for V and B is nonsingular. From above we have Px = x, where x can be any column of X. Also, Py = 0, where y is any column of Y. So PB = P ( X Y ) = ( X O ) or P = ( X O ) B 1 = ( X Y ) 1. This establishes the first part of (6). The second part follows by noting that ( ) I O B = ( X Y ) ( ) I O = ( X O ). O O O O fasshauer@iit.edu MATH

130 Complementary Subspaces We just saw that any projector is idempotent, i.e., P 2 = P. In fact, Theorem A matrix P is a projector if and only if P 2 = P. Proof. One direction is given above. For the other see [Mey00]. Remark This theorem is sometimes used to define projectors. fasshauer@iit.edu MATH

Complementary Subspaces Angle between subspaces In some applications, e.g., when determining the convergence rates of iterative algorithms, it is useful to know the angle between subspaces.

131 Complementary Subspaces Angle between subspaces In some applications, e.g., when determining the convergence rates of iterative algorithms, it is useful to know the angle between subspaces. If R, N are complementary then sin θ = 1 P 2 = 1 λ max = 1 σ 1, where P is the projector onto R along N, λ max is the largest eigenvalue of P T P and σ 1 is the largest singular value of P. See [Mey00, Example 5.9.2] for more details. fasshauer@iit.edu MATH

132 Complementary Subspaces Remark We will skip [Mey00, Section 5.10] on the range nullspace decomposition. While the range nullspace decomposition is theoretically important, its practical usefulness is limited because computation is very unstable due to lack of orthogonality. This also means we will not discuss nilpotent matrices and later on the Jordan normal form. MATH

133 Orthogonal Decomposition Definition [0] Let V be an inner product space and M V. The orthogonal 1 Vector Norms complement M of M is 2 Matrix Norms 3 Inner Product Spaces M = {x V : m, x = 0 for all m M}. 4 Orthogonal Vectors 5 Gram Schmidt Orthogonalization & QR Factorization 6 Unitary and Orthogonal Matrices 7 Orthogonal Reduction 8 Complementary Subspaces 9 Orthogonal Decomposition Remark 10 Singular Value Decomposition Even if M is not a subspace of V (i.e., only a subset), M is (see HW). 11 Orthogonal Projections fasshauer@iit.edu MATH

134 Orthogonal Decomposition Theorem Let V be an inner product space and M V. If M is a subspace of V, then V = M M. Proof According to the definition of complementary subspaces we need to show 1 M M = {0}, 2 M + M = V. fasshauer@iit.edu MATH

135 Orthogonal Decomposition Proof (cont.) 1 Let s assume there exists an x M M, i.e., x M and x M. The definition of M implies x, x = 0. But then the definition of an inner product implies x = 0. This is true for any x M M, so x = 0 is the only such vector. fasshauer@iit.edu MATH

136 Orthogonal Decomposition Proof (cont.) 2 We let B M and B M be ON bases for M and M, respectively. Since M M = {0} we know that B M B M is an ON basis for some S V. In fact, S = V since otherwise we could extend B M B M to an ON basis of V (using the extension theorem and GS). However, any vector in the extension must be orthogonal to M, i.e., in M, but this is not possible since the extended basis must be linearly independent. Therefore, the extension set is empty. fasshauer@iit.edu MATH

137 Orthogonal Decomposition Theorem Let V be an inner product space with dim(v) = n and M be a subspace of V. Then 1 dim M = n dim M, 2 M = M. Proof For (1) recall our dimension formula from Chapter 4 dim(x + Y) = dim X + dim Y dim(x Y). Here M M = {0}, so that dim(m M ) = 0. Also, since M is a subspace of V we have V = M + M and the dimension formula implies (1). fasshauer@iit.edu MATH

138 Orthogonal Decomposition Proof (cont.) 2 Instead of directly establishing equality we first show that M M. Since M M = V any x V can be uniquely decomposed into x = m + n with m M, n M. Now we take x M so that x, n = 0 for all n M, and therefore But 0 = x, n = m + n, n = m, n + n, n. }{{} =0 and therefore x = m is in M. n, n = 0 n = 0, fasshauer@iit.edu MATH

139 Orthogonal Decomposition Proof (cont.) Now, recall from Chapter 4 that for subspaces X Y dim X = dim Y = X = Y. We take X = M and Y = M (and know from the work just performed that M is a subspace of M). From (1) we know But then M = M. dim M = n dim M dim M = n dim M = n (n dim M) = dim M. fasshauer@iit.edu MATH

140 Orthogonal Decomposition Back to Fundamental Subspaces Theorem Let A be a real m n matrix. Then 1 R(A) = N(A T ), 2 N(A) = R(A T ). Corollary R m = R(A) R(A) = R(A) N(A T ), }{{} R m R n = N(A) N(A) = N(A) R(A T ). }{{} R n fasshauer@iit.edu MATH

141 Orthogonal Decomposition Proof (of Theorem) 1 We show that x R(A) implies x N(A T ) and vice versa. x R(A) Ay, x = 0 for any y R n y T A T x = 0 for any y R n y, A T x = 0 for any y R n A T x = 0 x N(A T ) by the definitions of these subspaces and of an inner product. 2 Using (1), we have R(A) (1) = N(A T ) R(A) = N(A T ) A A T R(A T ) = N(A). fasshauer@iit.edu MATH

142 Orthogonal Decomposition Starting to think about the SVD The decompositions of R m and R n from the corollary help prepare for the SVD of an m n matrix A. Assume rank(a) = r and let B R(A) = {u 1,..., u r } ON basis for R(A) R m, B N(A T ) = {u r+1,..., u m } ON basis for N(A T ) R m, B R(A T ) = {v 1,..., v r } ON basis for R(A T ) R n, B N(A) = {v r+1,..., v n } ON basis for N(A) R n. By the corollary B R(A) B N(A T ) ON basis for R m, B R(A T ) B N(A) ON basis for R n, and therefore the following are orthogonal matrices U = ( u 1 u 2 u m ) V = ( v 1 v 2 v n ). fasshauer@iit.edu MATH

143 Orthogonal Decomposition Consider Note that R = U T AV = (u T i Av j ) m,n i,j=1. Av j = 0, j = r + 1,..., n, u T i A = 0 A T u i = 0, i = r + 1,..., m, so u T 1 Av 1 u T 1 Av r ( ) R =.. O u T r Av 1 u T r Av r = Cr r O. O O O O fasshauer@iit.edu MATH

144 Orthogonal Decomposition Thus ( ) R = U T Cr r O AV = O O ( ) A = URV T Cr r O = U V T, O O the URV factorization of A. Remark The matrix C r r is nonsingular since rank(c) = rank(u T AV) = rank(a) = r because multiplication by the orthogonal (and therefore nonsingular) matrices U T and V does not change the rank of A. fasshauer@iit.edu MATH

145 Orthogonal Decomposition We have now shown that the ON bases for the fundamental subspaces of A yield the URV factorization. As we show next, the converse is also true, i.e., any URV factorization of A yields a ON bases for the fundamental subspaces of A. However, the URV factorization is not unique. Different ON bases result in different factorizations. fasshauer@iit.edu MATH

146 Orthogonal Decomposition Consider A = URV T with ( U, V) orthogonal m m and n n matrices, C O respectively, and R = with C nonsingular. O O We partition ( ) ( ) U1 U U = }{{}}{{} 2 V1 V, V = }{{}}{{} 2 m r m m r n r n n r Then V (and therefore also V T ) is nonsingular and we see that R(A) = R(URV T ) = R(UR) = R (( U 1 C O )) = R(U 1 C }{{} m r ) rank(c)=r so that the columns of U 1 are an ON basis for R(A). = R(U 1 ) (12) fasshauer@iit.edu MATH

147 Orthogonal Decomposition Moreover, N(A T ) prev. thm = R(A) (12) = R(U 1 ) = R(U 2 ) since U is orthogonal and R m = R(U 1 ) R(U 2 ). This implies that the columns of U 2 are an ON basis for N(A T ). The other two cases can be argued similarly using N(AB) = N(B) provided rank(a) = n. fasshauer@iit.edu MATH

148 Orthogonal Decomposition The main difference between a URV factorization and the SVD is that the SVD will contain a diagonal matrix Σ with r nonzero singular values, while R contains the full r r block C. As a first step in this direction, we can easily obtain a URV factorization of A with a lower triangular matrix C. Idea: use Householder reflections (or Givens rotations) fasshauer@iit.edu MATH

149 Orthogonal Decomposition Consider an m n matrix A. We apply an m m orthogonal (Householder reflection) matrix P so that ( ) B A PA =, with r m matrix B, rank(b) = r. O Next, use n n orthogonal Q as follows: ( ) B T QB T T =, with r r upper triangular T, rank(t) = r. O Then and BQ T = ( T T O ) B = ( T T O ) Q ( ) B = O ( ) T T O Q. O O fasshauer@iit.edu MATH

150 Orthogonal Decomposition Together, ( ) T T O PA = Q O O ( ) A = P T T T O Q, O O a URV factorization with lower triangular block T T. Remark See HW for an example of this process with numbers. fasshauer@iit.edu MATH

151 Singular Value Decomposition Singular Value Decomposition [0] 1 Vector Norms 2 Matrix Norms 3 Inner Product Spaces We know 4 Orthogonal Vectors 5 Gram Schmidt Orthogonalization & QR Factorization ( ) A = URV T C O = U V T, O O where C is upper triangular and U, V are orthogonal. 6 Unitary and Orthogonal Matrices 7 Orthogonal Reduction Now we want to establish that C can even be made diagonal. 8 Complementary Subspaces 9 Orthogonal Decomposition 10 Singular Value Decomposition 11 Orthogonal Projections fasshauer@iit.edu MATH

152 Singular Value Decomposition Note that A 2 = C 2 =: σ 1 since multiplication by an orthogonal matrix does not change the 2-norm (see HW). Also, C 2 = max z 2 =1 Cz 2 so that C 2 = Cx 2 for some x, x 2 = 1. In fact (see Sect.5.2), x is such that (C T C λi)x = 0, i.e., x is an eigenvector of C T C so that C 2 = σ 1 = λ = x T C T Cx. (13) fasshauer@iit.edu MATH

153 Singular Value Decomposition Since x is a unit vector we can extend it to an orthogonal matrix R x = ( x X ), e.g., using Householder reflectors as discussed at the end of Sect.5.6. Similarly, let y = Cx = Cx. (14) Cx 2 σ 1 Then R y = ( y Y ) is also orthogonal (and Hermitian/symmetric) since it s a Householder reflector. fasshauer@iit.edu MATH

154 Singular Value Decomposition Now R T y }{{} =R y CR x = ( y T Y T ) C ( x X ) = ( y T Cx y T ) CX Y T Cx Y T. CX From above σ 2 1 = λ (13) = x T C T Cx (14) = σ 1 y T Cx = y T Cx = σ 1. Also, Y T Cx (14) = Y T (σ 1 y) = 0 since R y is orthogonal, i.e., y is orthogonal to the columns of Y. fasshauer@iit.edu MATH

155 Singular Value Decomposition Let Y T CX = C 2 and y T CX = c T so that ( σ1 c R y CR x = T ). 0 C 2 To show that c T = 0 T consider c T = y T CX (14) = ( ) Cx T CX σ 1 From (13) x is an eigenvector of C T C, i.e., = x T C T CX σ 1. (15) C T Cx = λx = σ 2 1 x x T C T C = σ 2 1 x T. Plugging this into (15) yields c T = σ 1 x T X = 0 since R x = ( x X ) is orthogonal. fasshauer@iit.edu MATH

156 Singular Value Decomposition Moreover, σ 1 C 2 2 since σ 1 = C 2 HW = Ry CR x 2 = max{σ 1, C 2 2 }. Next, we repeat this process for C 2, i.e., ( σ2 0 T ) S y C 2 S x = with σ 0 C 2 C Let Then ( 1 0 T P 2 = 0 S T y ) ( ) R T 1 0 T y, Q 2 = R x. 0 S x σ T P 2 CQ 2 = 0 σ 2 0 T with σ 1 σ 2 C C 3 fasshauer@iit.edu MATH

157 We continue this until P r 1 CQ r 1 = O Finally, let Together, Singular Value Decomposition σ 1 σ 2... O = D, σ 1 σ 2... σ r. σ r ( ) ( ) Ũ T Pr 1 O = U T, and O I Ṽ = Qr 1 O. O I ( Ũ T D AṼ = O ) O O or without the tildes the singular value decomposition (SVD) of A ( ) D O A = U V T, O O where A is m n, U is m m, D = r r and V = n n. fasshauer@iit.edu MATH

158 Singular Value Decomposition We use the following terminology: singular values: σ 1 σ 2... σ r > 0, left singular vectors: columns of U, right singular vectors: columns of V. Remark In Chapter 7 we will see that the columns of U and V are also special eigenvectors of A T A. fasshauer@iit.edu MATH

159 Singular Value Decomposition Geometric interpretation of SVD For the following we assume A R n n, n = σ 2 v 2 1 σ 1 u v 2 1 v This picture is true since 1 A = UDV T AV = UD and σ 1, σ 2 are the lengths of the semi-axes of the ellipse because u 1 = u 2 = 1. Remark See [Mey00] for more details. fasshauer@iit.edu MATH

160 Singular Value Decomposition For general n, A transforms the 2-norm unit sphere to an ellipsoid whose semi-axes have lengths Therefore, σ 1 σ 2... σ n. κ 2 (A) = σ 1 σ n is the distortion ratio of the transformation A. Moreover, 1 σ 1 = A 2, σ n = A 1 2 so that κ 2 (A) = A 2 A 1 2 is the 2-norm condition number of A ( R n n ). fasshauer@iit.edu MATH

161 Singular Value Decomposition Remark The relations for σ 1 and σ n hold because A 2 = UDV T 2 HW = D 2 = σ 1 A 1 2 = VD 1 U T 2 HW = D 1 2 = 1 σ n Remark We always have κ 2 (A) 1, and κ 2 (A) = 1 if and only if A is a multiple of an orthogonal matrix (typo in [Mey00], see proof on next slide). fasshauer@iit.edu MATH

162 Singular Value Decomposition Proof = : Assume A = αq with α > 0, Q orthogonal, i.e., Also A 2 = α Q 2 = α max Qx invariance 2 = α max x 2 = α. x 2 =1 x 2 =1 A T A = α 2 Q T Q = α 2 I = A 1 = 1 α 2 AT and A T 2 = A 2 so that A 1 2 = 1 α and κ 2 (A) = A 2 A 1 2 = α 1 α = 1. fasshauer@iit.edu MATH

163 Singular Value Decomposition Proof (cont.) = : Assume κ 2 (A) = σ 1 σ n = 1 so that σ 1 = σ n and therefore D = σ 1 I. Thus and A = UDV T = σ 1 UV T A T A = σ 2 1 (UVT ) T UV T = σ 2 1 VUT UV T = σ 2 1 I. fasshauer@iit.edu MATH

164 Singular Value Decomposition Applications of the Condition Number Let x be the answer obtained by solving Ax = b with A R n n. Is a small residual r = b A x a good indicator for the accuracy of x? Since x is the exact answer, and x the computed answer we have the relative error x x. x fasshauer@iit.edu MATH

165 Singular Value Decomposition Now r = b A x = Ax A x = A(x x) A x x. To get the relative error we multiply by A 1 b x = 1. Then r A A 1 b x x x r b κ(a) x x. (16) x fasshauer@iit.edu MATH

166 Singular Value Decomposition Moreover, using r = b A x = b b, Multiplying by Ax b x x = A 1 (b b) A 1 r. = 1 we have x x x κ(a) r b. (17) Combining (16) and (17) yields 1 r κ(a) b x x x κ(a) r b. Therefore, the relative residual r b is a good indicator of relative error if and only if A is well conditioned, i.e., κ(a) is small (close to 1). fasshauer@iit.edu MATH

167 Singular Value Decomposition Applications of the SVD 1 Determination of numerical rank(a) : rank(a) index of smallest singular value greater or equal a desired threshold 2 Low-rank approximation of A: The Eckart Young theorem states that A k = k σ i u i v T i is the best rank k approximation to A in the 2-norm (also the Frobenius norm), i.e., A A k 2 = i=1 min A B 2. rank(b)=k Moreover, Run SVD_movie.m A A k 2 = σ k+1. fasshauer@iit.edu MATH

168 Singular Value Decomposition 3 Stable solution of least squares problems: Use Moore Penrose pseudoinverse Definition Let A R m n and be the SVD of A. Then ( ) D O A = U O O V T ( ) A D 1 O = V O O is called the Moore Penrose pseudoinverse of A. U T Remark Note that A R n m and A = r i=1 v i u T i σ i, r = rank(a). fasshauer@iit.edu MATH

169 Singular Value Decomposition We now show that the least squares solution of Ax = b is given by x = A b. fasshauer@iit.edu MATH

170 Singular Value Decomposition Start with normal equations and use ( ) D O A = U V T = O O ŨDṼT, the reduced SVD of A, i.e., Ũ Rm r, Ṽ Rn r. A T Ax = A T b ṼD ŨT }{{} Ũ x = ṼDŨT b =I Multiplication by D 1ṼT yields ṼD2 Ṽ T x = ṼDŨT b DṼT x = ŨT b. fasshauer@iit.edu MATH

171 Singular Value Decomposition Thus implies DṼT x = ŨT b x = ṼD 1 Ũ T b ( ) D 1 O x = V U T b O O x = A b. fasshauer@iit.edu MATH

172 Singular Value Decomposition Remark If A is nonsingular then A = A 1 (see HW). If rank(a) < n (i.e., the least squares solution is not unique), then x = A b provides the unique solution with minimum 2-norm (see justification on following slide). fasshauer@iit.edu MATH

173 Singular Value Decomposition Minimum norm solution of underdetermined systems Note that the general solution of Ax = b is given by z = A b + n, n N(A). Then z 2 2 = A b + n 2 2 Pythag. thm = A b n 2 2 A b 2 2. The Pythagorean theorem applies since (see HW) so that, using R(A T ) = N(A), A b R(A ) = R(A T ) A b n. fasshauer@iit.edu MATH

174 Singular Value Decomposition Remark Explicit use of the pseudoinverse is usually not recommended. Instead we solve Ax = b, A R m n, by 1 A = ŨDṼT (reduced SVD) 2 Ax = b DṼT x = ŨT b, so 1 Solve Dy = ŨT b for y 2 Compute x = Ṽy fasshauer@iit.edu MATH

175 Singular Value Decomposition Other Applications Also known as principal component analysis (PCA), (discrete) Karhunen-Loève (KL) transformation, Hotelling transform, or proper orthogonal decomposition (POD) Data compression Noise filtering Regularization of inverse problems Tomography Image deblurring Seismology Information retrieval and data mining (latent semantic analysis) Bioinformatics and computational biology Immunology Molecular dynamics Microarray data analysis MATH

Orthogonal Projections Orthogonal Projections [0] Earlier 1 Vector we Norms discussed orthogonal complementary subspaces of an inner product space V, i.e., 2 Matrix Norms 3 Inner Product Spaces V = M M.

176 Orthogonal Projections Orthogonal Projections [0] Earlier 1 Vector we Norms discussed orthogonal complementary subspaces of an inner product space V, i.e., 2 Matrix Norms 3 Inner Product Spaces V = M M. 4 Orthogonal Vectors Definition 5 Gram Schmidt Orthogonalization & QR Factorization Consider V = M M so that for every 6 Unitary and Orthogonal Matrices v V there exist unique vectors m M, n M such that 7 Orthogonal Reduction 8 Complementary Subspaces v = m + n. Then 9 Orthogonal m is called Decomposition the orthogonal projection of v onto M. The 10 matrix Singular Value P MDecomposition such that P M v = m is the orthogonal projector onto M along M. 11 Orthogonal Projections fasshauer@iit.edu MATH

177 Orthogonal Projections For arbitrary complementary subspaces X, Y we showed earlier that the projector onto X along Y is given by P = ( X O ) ( X Y ) 1 = ( X Y ) ( ) I O (X ) 1 Y, O O where the columns of X and Y are bases for X and Y. fasshauer@iit.edu MATH

178 Orthogonal Projections Now we let X = M and Y = M be orthogonal complementary subspaces, where M and N contain the basis vectors of M and M in their columns. Then P = ( M O ) ( M N ) 1. (18) To find ( M N ) 1 we note that M T N = N T M = O and if N is an orthogonal matrix (i.e., contains an ON basis), then ( (M T M) 1 M T ) ( ) (M ) I O N = O I N T (note that M T M is invertible since M is full rank because its columns form a basis of M). fasshauer@iit.edu MATH

179 Orthogonal Projections Thus ( M N ) 1 = ( (M T M) 1 M T N T ). (19) Inserting (19) into (18) yields P M = ( M O ) ( (M T M) 1 M T ) N T = M(M T M) 1 M T. Remark Note that P M is unique so that this formula holds for an arbitrary basis of M (collected in M). In particular, if M contains an ON basis for M, then P M = MM T. fasshauer@iit.edu MATH

180 Orthogonal Projections Similarly, P M = N(N T N) 1 N T (arbitrary basis for N ) P M = NN T ON basis As before, P M = I P M. Example If M = span{u}, u = 1 then P M = P u = uu T and P u T = I P u = I uu T (cf. elementary orthogonal projectors earlier). fasshauer@iit.edu MATH

181 Orthogonal Projections Properties of orthogonal projectors Theorem Let P R n n be a projector, i.e., P 2 = P. Then the matrix P is an orthogonal projector if 1 R(P) N(P), 2 P T = P, 3 P 2 = 1. fasshauer@iit.edu MATH

182 Orthogonal Projections Proof 1 Follows directly from the definition. 2 = : Assume P is an orthogonal projector, i.e., P = M(M T M) 1 M T and P T = M (M T M) T M = P. }{{} =(M T M) 1 = : Assume P = P T. Then R(P) = R(P T ) Orth.decomp. = N(P) so that P is an orthogonal projector via (1). fasshauer@iit.edu MATH

183 Orthogonal Projections Proof (cont.) 3 For complementary subspaces X, Y we know the angle between X and Y is given by P 2 = 1 [0, sin θ, θ π ]. 2 Assume P is an orthogonal projector, then θ = π 2 so that P 2 = 1. Conversely, if P 2 = 1, then θ = π 2 and X, Y are orthogonal complements, i.e., P is an orthogonal projector. fasshauer@iit.edu MATH

184 Orthogonal Projections Why is orthogonal projection so important? Theorem Let V be an inner product space with subspace M, and let b V. Then dist(b, M) = min n M b m 2 = b P M b 2, i.e., P M b is the unique vector in M closest to b. The quantity dist(b, M) is called the (orthogonal) distance from b to M. fasshauer@iit.edu MATH

185 Orthogonal Projections Proof Let p = P M b. Then p M and p m M for every m M. Moreover, so that b p = (I P M )b M, (p m) (b p). Then b m 2 2 = b p + p m 2 2 Pythag. = b p p m 2 2 b p 2 2. Therefore min m M b m 2 = b p 2. fasshauer@iit.edu MATH

186 Orthogonal Projections Proof (cont.) Uniqueness: Assume there exists a q M such that Then b q 2 = b p 2. (20) b q 2 2 = b }{{ p + p q }}{{} M M 2 2 Pythag. = b p p q 2 2. But then (20) implies that p q 2 2 = 0 and therefore p = q. fasshauer@iit.edu MATH

187 Orthogonal Projections Least squares approximation revisited Now we give a modern derivation of the normal equations (without calculus), and note that much of this remains true for best L 2 approximation. Goal of least squares: For A R m n, find min m x R n i=1 ((Ax) i b i ) 2 min x R n Ax b 2. Now Ax R(A), so the least squares error is dist(b, R(A)) = min Ax R(A) b Ax 2 = b P R(A) b 2 with P R(A) the orthogonal projector onto R(A). fasshauer@iit.edu MATH

188 Orthogonal Projections MATH

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination

Math 102, Winter Final Exam Review. Chapter 1. Matrices and Gaussian Elimination Math 0, Winter 07 Final Exam Review Chapter. Matrices and Gaussian Elimination { x + x =,. Different forms of a system of linear equations. Example: The x + 4x = 4. [ ] [ ] [ ] vector form (or the column