NORMS ON SPACE OF MATRICES. Operator Norms on Space of linear maps Let A be an n n real matrix and x 0 be a vector in R n. We would like to use the Picard iteration method to solve for the following system of ordinary linear differential equation { x (.) (t) = Ax(t) x(0) = x 0 By integrating the differential equation (.), we obtain that x(t) = x 0 + t 0 Ax(s)ds. Let x 0 (t) = x 0 for t [0, b] be the constant function and define x : [0, b] R n recursively by x n (t) = x 0 + By induction, we obtain that for any k N, We can rewrite x k (t) as x k (t) = x k (t) = x 0 + (ta)! t 0 Ax n (s)ds. x 0 + + (ta)k x 0. k! ( I n + (ta) ) + + (ta)k x 0, for any k.! k! Here I n denotes the n n identify matrix. It is natural for us to ask: can we define (ta) k I n +? k! k= To define an infinite series of matrices, we need to define a norm on M n (R), where M n (R) is the space of n n real matrices. Let us define a norm on the space M mn (R) of m n matrices where m and n are not necessarily equal. Let A be an m n matrix. We define a function L A : R n R m by L A (x) = Ax. It is routine to check that L A is a linear map. Conversely, given a linear map T : R n R m, can we find a unique m n matrix A such that T = L A? Let T : R n R m be a linear map. For each x R n, we write T (x) = (T (x),, T m (x)). Since T is linear, T i : R n R is linear for each i m. Lemma.. Let S : R n R be a linear map. Then there exists a vector a R n such that S(x) = a, x for any x R n. Proof. Let β = {e,, e n } be the standard basis for R n. We write x = n i= x ie i. By linearity of S, S(x) = x S(e ) + + x n S(e n ). Let a i = S(e i ) for i n. For a = (a,, a n ), we have S(x) = a x + + a n x n = a, x.
NORMS ON SPACE OF MATRICES Since T i : R n R is linear for each i m, we can choose b i = (a i,, a in ) R n such that T i (x) = b i, x for any x R n. Let A be the m n matrix a a a n a a a n A =......, a m a mn i.e. A is the matrix whose i-th row vector is b i. By definition, T = L A. Proposition.. Let L(R n, R m ) be the space of linear maps from R n to R m. Then L(R n, R m ) is a vector subspace of the space of functions F(R n, R m ) from R n to R m. Theorem.. Define φ : M m n (R) L(R n, R m ) by A L A. Then φ is a linear isomorphism. Proposition.. Let T : R n R m be a linear map. Then there exists M > 0 such that Hence T is a Lipschitz function. T(x) R m M x R n for any x R n. Proof. Let us write T = (T,, T m ). For each i m, we choose b i so that T i (x) = b i, x for all x R n. For any x R n, T (x) R m = T (x) + + T m (x) = b, x + + b m, x By Cauchy Schwarz inequality, b i, x b i R n x R n. For any x R n, T (x) R m ( b R n + + b m R n) x R n. Let M = b R + + b n m R. This proves the assertion. n Let T : R n R m be a linear map and M > 0 be as above. For x R n =, we find T (x) R m M. This allows us to define the operator norm of T by T op = sup T (x) R m. x R n= Theorem.. The space L(R n, R m ) of linear maps from R n to R m together with op is a real Banach space. Proof. We leave it to the reader to verify that op is a norm on L(R n, R m ). The completeness of L(R n, R m ) wth respect to op will be proved later. It is not difficult for us to prove the following properties. Proposition.3. Let T L(R n, R m ) and S L(R m, R p ). Then () T (x) R m T op x R n, and () S T op S op T op.
NORMS ON SPACE OF MATRICES 3 Proof. Let us prove (). When x = 0, the statement is obvious. When x 0, let y = x/ x R n. Then T (y) R m T op. By linearity of T, ( ) T (y) R m = x T = T (x) R n T op. x R n x R n Multiplying the above inequality both side by x R n, we obtain (). For (), we use (): (S T )(x) R p S op T (x) R m S op T op x R m. Hence S T op S op T op. Corollary.. If T : R n R n is linear, then T k op T k op for any k N. Proof. This can be proved by induction. For A M mn (R), we define the matrix norm of A to be the operator norm of L A, i.e. A = L A op. Theorem.3. The space M mn (R) of m n real matrices together with the matrix norm is a real Banach space. Proof. Let us denote by m A = i= j= n a ij for each A M mn (R). For any A M mn (R), A = max{ a ij : i m, j n}, A A A. (See class note for the detail of the proof of this inequality) Let (A k ) be a Cauchy sequence in M mn (R). For any ɛ > 0, there exists K ɛ N such that A k A l < ɛ/4mn whenever k, l K ɛ. Denote A k by [a ij (k)]. For any k, l K ɛ, a ij (k) a ij (l) A k A l < ɛ 4mn. This implies that (a ij (k)) is a Cauchy sequence in R for i m and j n. By completeness of R, (a ij (k)) is convergent in R. Denote a ij = lim k a ij (k). When k K ɛ, For k K ɛ, a ij (k) a ij = lim l a ij (k) a ij (l) ɛ 4mn. ɛ A k A A k A = a ij (k) a ij 4 = ɛ < ɛ. ij This shows that lim k A k = A in M mn (R). Hence M mn (R) is a real Banach space.
4 NORMS ON SPACE OF MATRICES This theorem implies that L(R n, R m ) forms a Banach space with respect to the operator norm. Given an n n real matrix A, it is make sense for us to ask if the limit (s k ) exists in M n (R), where (.) s k = I n + A! + + Ak k!, k. Theorem.4. Let (V, ) be a real Banach space and (v n ) be a sequence of vectors in V. Suppose that n= v n is convergent in R. Then n= v n is convergent in (V, ). Proof. Exercise. Theorem.5. (Comparison Test) Let (a n ) and (b n ) be a sequence of nonnegative real numbers. Suppose that () 0 a n b n for any n, () n= b n is convergent in R. Then n= a n is convergent. Proof. Exercise. Corollary.. (Comparison Test in Banach Space) Let (V, ) be a real Banach space and (v n ) be a sequence of vectors in V. Suppose that there exists a sequence of nonnegative real numbers (b n ) such that () v n b n for any n, () n= b n is convergent in R. Then n= v n is convergent in (V, ). Let v = I n and v k = A k /(k )! for k and b = and b k = A k /(k )! for k. Since A k A k for any k, we find v k b k for any k. Using elementary calculus, we know k= b k is convergent in R (to e A ). By comparison test in Banach space, we find k= v k is convergent in (M n (R), ), i.e. (s k ) is convergent in (M n (R), ). We define e A to be the limit of (s k ) in (M n (R), ), i.e. k=0 e A = lim k s k = I n + Here the right hand side is the expression for lim k s k. Similarly, we can define cos A and sin A for any n n real matrix A by ( ) k ( ) k cos A = (k)! Ak, sin A = (k + )! Ak+. We leave it to the reader to verify that cos A and sin A are defined. (The infinite series of n n matrices are convergent in M n (R).) Theorem.6. Let x(t) = e ta x 0 for t [0, b]. Then x : [0, b] R n is the unique differentiable function which solves (.). To prove this theorem, we need to introduce the space of vector valued continuous functions. This theorem will be proved later. Let us study more about the space M n (R). In calculus, we have seen that for each x R with x <, k= k=0 x = x k. k=0 A k k!.
NORMS ON SPACE OF MATRICES 5 We may ask ourself that whether the equality holds for any n n matrix A with A <. Since A <, k=0 A k is convergent in R. Here A 0 = I n the identity matrix. Since M n (R) is a real Banach space, k=0 Ak is convergent in M n (R). Proposition.4. Let A M n (R) with A < and B = k=0 Ak. Then B(I n A) = (I n A)B = I n. This implies that I n A is invertible with B = (I n A). Proof. Let s k = I n + A + + A k for each k. For each k, Using the inequality, As k = A + + A k = s k+ I n = s k A. As k AB = A(s k B) A s k B s k A BA = s k A BA s k B A, we know lim k As k = AB and lim k s k A = BA. On the other hand, lim k (s k+ I n ) = B I n. We find that AB = B I n = BA. This implies that This proves our assertion. (I n A)B = B(I n A) = I n. As an application to this proposition, let us prove the following. Theorem.7. Let GL n (R) be the set of all real n n invertible matrices. Then GL n (R) forms an open subset of M n (R). Proof. Let A GL n (R). Choose ɛ = / A. Claim that B(A, ɛ) GL n (R). This is equivalent to say that if B A < ɛ, i.e. B B(A, ɛ), then B is invertible. Observe that B = B A + A = ( (B A)A + I ) A. Let C = (B A)A Then B = (I + C)A and C B A A < A A =. By the previous proposition, I + C is invertible. Since A is invertible and product of any two invertible matrices is again invertible, B = (I + C)A is invertible. Hence B GL n (R). We proved that B(A, ɛ) GL n (R), and hence A is an interior point of GL n (R). Since A is arbitrarily in GL n (R), GL n (R) is open. Theorem.8. Let φ : GL n (R) GL n (R) be the map φ(a) = A for A GL n (R). Then φ is continuous. Proof. Let A GL n (R) and choose d = / A. For any B B(A, d), B A < d < A. Hence B is invertible; hence B(A, d) GL n (R). Observe that for B B(A, d), and hence φ(b) φ(a) = B (A B)A φ(b) φ(a) B A B A.
6 NORMS ON SPACE OF MATRICES Let us estimate B when B B(A, d). For each y R n, y R n = A (Ay) R n A Ay R n. Hence A y R n Ay R n. By triangle inequality and the norm inequality, Ay R n = (A B)y + By R n (A B)y R n + By R n A B y R n + By R n. This shows that for any y R n, ( A A B ) y R n By R n. Since A B < d = A /, ( A A B ) > A /. This shows that for any y R n, A y R n By R n. Since B is invertible, for any x R n, there exists a unique y R n so that By = x. We find that y = B x and hence B x R n A x R n. We find that B A. Thus for any B B(A, ɛ), φ(b) φ(a) A A B. For any ɛ > 0, we choose Then for any B B(A, δ A,ɛ ), This proves that φ is continuous. { δ A,ɛ = min ɛ A, d }. φ(b) φ(a) < A δ A ɛ A = ɛ.
NORMS ON SPACE OF MATRICES 7. Spectral Theorem for symmetric matrices with an application to the computation of Operator Norms Let A : R n R m be a linear map. The operator norm of A is defined to be A = where S n = {x R n : x = }. Then sup Ax, x S n A = sup Ax x S n = sup Ax, Ax x S n = sup x S n A t Ax, x. Denote T = A t A. Then T : R n R n is a linear map such that () T is symmetric, i.e. T t = T, and () T x, x = Ax 0 for any x R n. Definition.. A linear map T : R n R n is nonnegative definite if it satisfies () and (). For any nonnegative definite linear map Q : R n R n, we define a function Q T : R n R by Q T (x) = T x, x, x R n. If we denote T by [T ij ] and x = (x,, x n ), then n Q T (x) = T ij x i x j. i,j= We see that Q T is a real valued continuous function. Since S n is closed and bounded, by Bolzano-Weierstrass Theorem, S n is sequentially compact. By Extreme value Theorem, Q T attains its maximum (and also minimum) on S n. Let λ be the maximum of Q T on S n and v S n so that Q T (v ) = λ. Since Q T (x) 0 for any x R n, λ 0. Since T v, v = λ, (λ I T )v, v = 0. Let us prove that, in fact, we have (λ I T )v, w = 0 for any w R n. If the statement is true, then T v = λ v, i.e. λ is an eigenvalue of T and v is an eigenvector of T corresponding to eigenvalue λ. Let B = λ I T. Then Bv, v = 0. We want to show that Bv, w = 0 for any w R n. Since any w R n can be written uniquely as w = av + y for a R and y {v }, Bv, w = a Bv, v + Bv, y = Bv, y. Hence if we can show that for any y {v }, Bv, y = 0, then Bv, w = 0 for any w R n. Thus we only need to prove that Bv, w = 0 is true for w {v }. Since Q T (v) λ for any v S n, Q T (x) λ x for any x R n. This implies that Bx, x 0 for any x R n. For any t R, and any w R n, 0 B(v + tw), (v + tw) = Bv, v + Bv, w t + Bw, w t = Bv, w t + Bw, w t.
8 NORMS ON SPACE OF MATRICES This shows that Bv, w 0. This shows that Bv, w = 0. Theorem.. Let T : R n R n be a nonnegative definite linear map. The number λ = max x, x x Sn T is an eigenvalue of T and any v S n with T v, v = λ is an eigenvector of T corresponding to λ. Lemma.. Let T, λ and v be as above. R n = V W and T (W ) W. Proof. If w W, then v, w = 0 and hence v, T w = T v, w = λ v, w = 0. Let V = span{v } and W = V. Then This shows that T w V = W. For any x R n, we can write x = av + w for w W. Hence T (x) = aλ v + T (w). Let T : W W be the map defined by T (w) = T (w). Since T is linear, T is linear. Furthermore, Since T is symmetric, for any w, w W, T w, w = T w, w = w, T w = w, T (w ). This shows that T is also symmetric. For any w W, T w, w = T w, w 0. This shows that T is nonnegative definite. Set λ = max T w, w. {w W : w =} Remark. The set {w W : w = } is the intersection of W and S n. If we denote v by (a,, a n ), then W = {(x,, x n ) R n : a x + + a n x n = 0}. Hence W is a closed subset of R n. Therefore W S n is closed. Since S n is bounded, W S n is bounded. The intersection {w W : w = } is closed and bounded. By Bolzano-Weierstrass Theorem, {w W : w = } is sequentially compact. We apply the extreme value theorem to find λ. Then λ λ 0. Choose v W so that v = and λ = T v, v. Then v v and T v = λ v. This shows that T v = λ v. By induction, we can find a finite nonincreasing nonnegative real numbers λ λ λ n 0 and an orthonormal basis {v i : i n} for R n such that T v i = λ i v i for i n. Since {v i : i n} is an orthonormal basis for R n, for any x R n, we can write x = n i= x, v i v i. By linearity of T, we obtain T x = n i= λ i x, v i v i. Let Λ = diag(λ,, λ n ) and V be the n n matrix whose i-th column vector is v i. Then T V = V Λ which implies that T = V ΛV t by V t V = I.
NORMS ON SPACE OF MATRICES 9 Theorem.. (Spectral Theorem for nonnegative definite linear map) Let T : R n R n be a nonnegative definite linear map. Then there exist a finite nonincreasing nonnegative real numbers λ λ λ n 0 and an orthonormal basis {v i : i n} for R n such that () T v i = λ i v i for i n. () T x = n i= λ i x, v i v i. (3) T = V ΛV t where Λ = diag(λ,, λ n ) and U is the n n matrix whose i-th column vector is v i. Let S : R n R n be any symmetric linear map. We consider m = min x. x Sn Sx, Set T = S mi. Then T is nonnegative definite. By spectral theorem for T, we choose λ i R and v i R n such that T v i = λ i v i as above, Then Sv i = (λ i + m)v i for i n. Let µ i = λ i + m. Then Sv i = µ i v i for i n. We see that v i is also an eigenvector of S with eigenvalue µ i. Furthermore, n Sx = µ i λ i x, v i v i. Thus we prove that Corollary.. Spectral Theorem holds for any symmetric linear map. i= Let us go back to the computation of A. It follows from the definition that A is the largest eigenvalue of A t A. Let us denote λ (A t A) the largest eigenvalue of A t A. Then A = λ (A t A). Definition.. Let λ i (A t A) be the eigenvalue of A t A such that The i-th singular value of A is defined to be λ (A t A) λ n (A t A) 0. s i (A) = λ i (A t A). Corollary.. Let A : R n R m be a linear map. Then A = s (A). Now let us introduce the singular value decomposition for the any linear map. Let us recall a basic theorem in linear algebra. Proposition.. Let A : R n R m be a linear map. Then () (Im A) = ker A t () (Im A t ) = ker A. Proof. Let z (Im A). Then Ax, z = 0 for any x R n. x, A t z = Ax, z = 0 for any x R n. Therefore A t z = 0. We see that z ker A t. If z ker A t, then A t z = 0. Hence x, A t z = 0 for any x R n which implies that Ax, z = 0 for any x R n. Therefore z (Im A).
0 NORMS ON SPACE OF MATRICES Choose an orthonormal basis {v i : i n} such that A t Av i = s i v i for i n. Let us assume that s i = 0 for i > k and s k 0. Let z i = Av i for i k. We find A t z i = s i v i for i k. Let us compute z i, z j for any i, j k : z i, z j = Av i, Av j = A t Av i, v j = s i v i, v j = s i δ ij. Hence {z i : i k} forms an orthogonal subset of Im A. Since Av j = 0 for k + j n, nullity A = dim ker A = n k. By rank-nullity lemma, rank A + nullity A = n which shows that rank A = k. Since {z i : i k} is orthogonal, it is linearly independent. We see that {z i : i k} forms an orthogonal basis for Im A. Let u i = z i /s i for i k. Then {u i : i k} forms an orthonormal basis for Im A. We write k Ax = Ax, u i u i = i= Since A t u i = A t (z i /s i ) = s i v i, we find k x, A t u i u i. i= Ax = k s i x, v i u i. i= We can extend {u i : i k} to an orthonormal basis {u i : i m} for R m. Let U be the m m matrix whose i-th column vector is u i and V be the n n matrix whose j-th column vector is v i. Then AV = UΣ. Since V t V = I n (V is an orthogonal matrix), then we obtain the matrix form of the singular value decomposition Here Σ = diag(s,, s n ). A = UΣV t. Theorem.3. (Singular Value Decomposition for any linear map) Let A : R n R m be any linear map. Suppose k = rank A. There exists an orthonormal basis {v i } for R n and orthonormal basis {u i } for Im A and a finite nonnegative nonincreasing sequence of real numbers s s k such that () A t Av i = s i v i for i k and A t Av j = 0 for k + j n and () Av i = s i u i for i k and Av j = 0 for k+ j n and A t u i = s i v i for i k, and (3) Ax = k i= s i x, v i u i for any x R n. (4) A = UΣV t, where U is the m m matrix whose i-th column vector is u i and V is the n n matrix whose j-th column vector is v i and Σ = diag(s,, s n ). [ ] Example.. Let A =. () Find A. () Find the singular value decomposition of A. [ [ ] The rank of A is one and the image of A is spanned by. Take u ] = 5 and [ ] v =. Then Av = [ ] [ ] 0u. Take v = and u = 5. Then Av = 0.
NORMS ON SPACE OF MATRICES The larges singular value of A is 0 and hence A = 0. We see that [ ] [ ] 5 A [ 0 ] = 5 0 5. 5 0 0 Hence we obtain the singular value decomposition of A (in the matrix form) [ ] 5 [ 0 ] [ ] A = 5 0 5 5 0 0.