Linear Algebra 18 Orthogonality

Linear Algebra 18 Orthogonality Wei-Shi Zheng, wszheng@ieeeorg, 2011 1 What Do You Learn from This Note We still observe the unit vectors we have introduced in Chapter 1: 1 0 0 e 1 = 0, e 2 = 1, e 3 = 0 (1) 0 0 1 Question: Have you tried to compute the inner product like e 1 e 2, e 1 e 3 and e 2 e 3? You will actually find that e 1 e 2 = 0, e 1 e 3 = 0 and e 2 e 3 = 0 That is the three standard basis vectors are orthogonal, more precisely orthonormal We have a special name for such a type of basis called orthogonal (orthonormal) basis We will comprehensively introduce this in this lecture note Basic Concept:orthogonal set ( 正交集 ), orthogonal basis ( 正交基 ), Orthogonal Matrix ( 正交矩阵 ), Orthogonal Projection ( 正交投影 ), Gram Schmidt Process ( 格拉姆 - 施密特正交化 ) 2 Orthogonal Basis 21 Definition Definition 1 (orthogonal set ( 正交集 )) Let S = { v 1,, v r } R n { 0} We say S is an orthogonal set iff for any i, j = 1,, r and i j, we have v i v j Furthermore, if v 1,, v r are all unit vectors then S is called an orthonormal set 1

Example: Textbook P384 Definition 2 (orthogonal basis ( 正交基 )) A basis B of R n which is also orthogonal is called an orthogonal basis Furthermore, B is called an orthonormal basis if it is orthonormal Example: The standard basis E = { e 1,, e n } is an orthonormal basis of R n Example: Textbook P389 22 Some Properties Connection between orthogonal set & linearly independent: Theorem 3 Let S = { v 1,, v r } be an orthogonal set Then S is linearly independent Proof Let c 1,, c r be scalars Suppose that 0 = c 1 v 1 + + c r v r Then for each i = 1,, r, we have 0 = v i 0 = v i (c 1 v 1 + + c r v r ) = c 1 ( v i v 1 ) + + c r ( v i v 1 ) = c i ( v i v i ) Since v i v i 0, we must have c i = 0 and the result follows Orthogonal Matrix ( 正交矩阵 ): Theorem 4 Let U = ( v 1 v r ) R n r Then 1 { v 1,, v r } is orthogonal iff U T U is invertible and diagonal; 2 { v 1,, v r } is orthonormal iff U T U = I r Proof 1 { v 1,, v r } is orthogonal [ U T U ] ij = vt i v j = v i v j = 2 Similar to 1 U T U is invertible and diagonal { vi 0 if i = j; 0 otherwise 2

Definition 5 (orthogonal matrix) orthogonal iff U T U = I n (or U T = U 1 ) ( cos θ sin θ Example: For any θ R, sin θ cos θ A matrix U R n is said to be ) is orthogonal Theorem 6 1 I n is orthogonal 2 If U R n is orthogonal then so is U 1 3 If U 1 and U 2 R n are orthogonal then so is U 1 U 2 (So the set of orthogonal matrices of size n is a group under multiplication) Proof Easy, left as an exercise Why is orthogonal basis useful? Theorem 7 Let B = { b 1,, b n } be an orthogonal basis of R n Then for any v R n we have So if B is orthonormal, then ( ) T ( v b 1 ) [ v] B = ( b 1 b 1 ) ( v b n ) ( b n b n ) [ v] B = ( ( v b 1 ) ( v b n )) T Proof Suppose [ v] B = ( c 1 c n ) T Then for i = 1,, n, we have v b i = ( c 1 b1 + + c n bn ) b i = c 1 ( b 1 b i ) + + c n ( b n b i ) = c i ( b i b i ) So c i = ( v b i ) ( b i b i ) Example: Textbook P385 3

Theorem 8 Let B = { b 1,, b n } be an orthonormal basis of R n Then for any u, v R n, 1 u v = [ u] B [ v] B ; 2 v = [ v] B ; 3 d( u, v) = d([ u] B, [ v] B ) This means the coordinate isomorphism with respect to B preserves dot product, norm and distance, in other word, the geometric structure of R n is preserved Proof 1 [ u] B [ v] B = ( u b 1 )( v b 1 ) + + ( u b n )( v b n ) = ( u ( v b 1 ) b 1 ) + + ( u ( v bn ) b n ) = u (( v b 1 ) b 1 + + ( v b n ) b n ) = u v 2 and 3 are derived from 1 directly Theorem 9 ( u 1 u n ) is orthogonal iff { u 1,, u n } is an orthonormal basis Theorem 10 Let B be an orthonormal basis Then B is an orthonormal basis iff [B ] B is orthogonal Proof Suppose that B = {b 1,, b n} Then B is orthonormal {b 1,, b n} is orthonormal {[b 1] B,, [b n] B } is orthonormal [by Theorem 8] ([b 1] B [b n] B ) is orthogonal 4

3 Orthogonal Projection ( 正交投影 ) 注 : 下面的要求大家完全掌握证明 Theorem 11 (The Orthogonal Decomposition Theorem) Let W be s subspace of R n Then each y in R n can be written uniquely in the form y = ˆ y + z, where ˆ y is in W and z is in W In fact, if { u 1,, u p } is any orthogonal basis of W, then ˆ y = y u 1 u 1 + + y u p u p u 1 u 1 u p u p and Proof Sketch of the proof: z = y ˆ y 1 Prove ˆ y W (Linear combination of all basis in W ) 2 Prove z W ( z u i, for all u i ) 3 Prove the uniqueness of the decomposition Definition 12 (orthogonal projection) We say the orthogonal projection of y onto a subspace W is proj W y = y u 1 u 1 u 1 u 1 + + y u p u p u p u p where { u 1,, u p } is any orthogonal basis of W Theorem 13 If { u 1,, u p } is any orthonormal basis of a subspace W of R n, then proj W y = ( y u 1 ) u 1 + + ( y u p ) u p If U = [ u 1,, u p ] then proj W y = UU T y If we replace the subspace W in the above theorem by a special subspace L = span{ u}, we can still have: 5

Theorem 14 For any vector y R n, we can have proj L y = y u u u, and z is orthogonal to u in R n We call proj L y the orthogonal projection of y onto L Theorem 15 (The Best Approximation Theorem) Let W be a subspace of R n, y be any vector in R n Then proj W y is the closet point in W to y, in the sense that y proj W y y v for all v in W distinct from ˆ y Proof Since y v = y proj W y + proj W y v, and y proj W y W and proj W y v W, so y v 2 = y proj W y 2 + proj W y v 2 That is y v 2 y proj W y 2 6

4 Where to use the Best Approximation Theorem Question: We just do nothing when the matrix equation A x = b has no solution in real scenarios? NO!!!NO!!!NO!!!! Let A R m n and b R m If there is no solution for the matrix equation A x = b, we are still required to find x R n such that A x is the best approximation of b in some sense A feasible approach to this problem is to find x such that the distance between b and A x is as small as possible We formulate this idea as follows: Definition 16 (the least squares solution (version 1)) Let A R m n and b R m Then a least squares solution of A x = b is an ˆ x R n such that b Aˆ x b A x for all x R n The distance d( b, Aˆ x) = b Aˆ x is called the least squares error of A x = b Define W = Col(A) Notice that w = A x for some x R n iff w W and by Theorem 15, the closest vector to b among all vectors in W is the orthogonal projection b W of b onto W, which is unique So Definition 16 can be recast equivalently as Definition 17 (the least squares solution (version 2)) Let A R m n and b R m Then a least squares solution of A x = b is an ˆ x R n such that Aˆ x = b W, where W = Col(A) 7

Question: How to compute the least-square solution? The least squares solution set of a matrix equation over R is identified completely as follows: Theorem 18 Let A R m n and b R m Then the least squares solution set of the equation A x = b is precisely the solution set of the equation Proof Define W = Col(A) Then A T A x = A T b Aˆ x = b W b Aˆ x W So the result follows b Aˆ x Nul(A T ) [since W = Nul(A T )] A T ( b Aˆ x) = 0 A T Aˆ x = A T b Examples: Textbook P411, P412, P413 Remark: If { a 1,, a n } is an orthogonal set then So ˆ x can be written down directly as bw = b a 1 a 1 a 1 a 1 + + b a n a n a n a n ˆ x = ( b a1 a 1 a 1 b an a n a n ) T Example: Textbook P414 8

Question: What is the case when A T A is invertible? If A T A is invertible then the matrix equation A x = b has a unique least squares solution, namely (A T A) 1 A T b But this is not always the case The next theorem gives a sufficient and necessary condition for A T A being invertible Lemma 19 Let A R m n Then 1 Nul(A) = Nul(A T A); 2 rank(a) = rank(a T A) Proof 1 Suppose that x Nul(A), namely A x = 0 Then A T Ax = A T 0 = 0 So x Nul(A T A), that is Nul(A) Nul(A T A) Conversely, suppose that x Nul(A T A), namely A T Ax = 0 Then (A x) (A x) = (A x) T (A x) = A x 2 = 0, which forces that A x = 0 So x Nul(A) and Nul(A T A) Nul(A) 2 By the rank & nullity Theorem, rank(a) = n dim Nul(A) = n dim Nul(A T A) = rank(a T A) Theorem 20 Let A = ( a 1 a n ) R m n Then A T A is invertible iff { a 1,, a n } is linearly independent Proof Notice that A T A R n Then A T A is invertible rank(a T A) = n rank(a) = n [by Lemma 4] dim Col(A) = dim Span{ a 1,, a n } = n { a 1,, a n } is linearly independent 9

5 Constructing Orthogonal Basis by Gram Schmidt Process ( 格拉姆 - 施密特正交化 ) Question: We have seen the usefulness of orthonormal/orthogonal basis But how can we generate them? 注 : 这里告诉我们怎样从一组向量集中构造正交基 We shall show this by the means of the Gram Schmidt process Theorem 21 (the Gram Schmidt process) Let { w 1,, w r } R n be a linearly independent set Then we can construct an orthogonal set { v 1,, v r } R n such that Span{ v 1,, v r } = Span{ w 1,, w r } Furthermore, by normalising { v 1,, v r }, we obtain an orthonormal set { u 1,, u r } R n such that Span{ u 1,, u r } = Span{ w 1,, w r } Proof We shall prove the first part of the theorem by induction on r( 下面, 我们用数学归纳法证明 ) 1 STEP 1: r = 1 Then we can take v 1 = w 1 and the result is trivial 2 STEP 2: Inductive Hypothesis Assume the result holds for r 1 3 STEP 3: Inductive Step Since { w 1,, w r 1 } is linearly independent, so by Inductive Hypothesis, we can construct an orthogonal set { v 1,, v r 1 } such that Span{ v 1,, v r 1 } = Span{ w 1,, w r 1 } Now, v i 0, for all i = 1,, r 1, so v r = w r w r v 1 v 1 w r v r 1 v r 1 v 1 v 1 v r 1 v r 1 is well defined It is obvious that Span{ v 1,, v r } = Span{ w 1,, w r }, 10

so v r 0 Also for i = 1,, r 1, we have v r v i = w r v i w r v 1 v 1 v 1 v 1 v i w r v r 1 v r 1 v r 1 v r 1 v i = w r v i w r v i v i v i v i v i = 0 Therefore { v 1,, v r } is an orthogonal set Finally, for i = 1,, r, define u i = v i v i Then { u 1,, u r } is orthonormal and Span{ u 1,, u r } = Span{ w 1,, w r } The Gram Schmidt Process: According to the proof of the above theorem, we can write down v 1,, v r and u 1,, u r directly as follows: v 1 = w 1, v 2 = w 2 w 2 v 1 v 1 v 1 v 1, v i = w i w i v 1 v 1 w i v i 1 v i 1, v 1 v 1 v i 1 v i 1 v r = w r w r v 1 v 1 w r v r 1 v r 1, v 1 v 1 v r 1 v r 1 u 1 = v 1 v 1, u 2 = v 2 v 2,, u r = v r v r This process of writing down v 1,, v r and u 1,, u r is called the Gram Schmidt process, which can be used to creating an orthonormal basis for any subspace of R n Examples: Textbook P402, P405 11

QR Decomposition: Application of Gram Schmidt: An application of Gram Schmidt is for decompose a matrix A whose columns are linearly independent into the following forms: A = QR, where Q is a matrix whose columns form an orthonormal basis for ColA and R is an upper triangular invertible matrix with positive entries onits diagonal More specifically, we have the following theorem: Theorem 22 (QR factorization (QR 分解 )) Let A = ( w 1 w r ) R m r and rank(a) = r Then A can be factorised as A = QR, where Q R m r such that the columns of Q form an orthonormal basis of Col(Q) and R R r r is an upper triangular matrix such that diagonal entries are positive Proof The rank of A is r indicates that { w 1,, w r } is linearly independent So by Theorem 1, we can construct orthogonal set { v 1,, v r } and orthonormal set { u 1,, u r } such that for i = 1,, r, w i = w i v 1 v 1 + + w i v i 1 v i 1 + v i, v i = v i u i, v 1 v 1 v i 1 v i 1 That is and w 1 2 v 1 v 1 v 1 0 1 ( w 1 w r ) = ( v 1 v r ) 0 0 1 w r v 1 v 1 v 1 w r v 2 v 2 v 2 ( v 1 v r ) = ( u 1 u r ) diag( v 1, v 2,, v r ) So w v 1 0 0 1 2 v 1 v 1 v 1 0 v 2 0 0 1 ( w 1 w r ) = ( u 1 u r ) 0 0 v r 0 0 1 v 1 w 2 u 1 w r u 1 0 v 2 w r u 2 = ( u 1 u r ) 0 0 v r w r v 1 v 1 v 1 w r v 2 v 2 v 2 12

Take Q = ( u 1 u r ) and R = result follows v 1 w 2 u 1 w r u 1 0 v 2 w r u 2 0 0 v r and the Example: Textbook P406 13