COMPUTING THE NORM OF A MATRIX KEITH CONRAD 1. Introducton In R n there s a standard noton of length: the sze of a vector v = (a 1,..., a n ) s v = a 2 1 + + a2 n. We wll dscuss n Secton 2 the general concept of length n a vector space, called a norm, and then look at norms on matrces n Secton 3. In Secton 4 we ll see how the matrx norm that s closely connected to the standard norm on R n can be computed from egenvalues of an assocated symmetrc matrx. 2. Norms on Vector Spaces Let V be a vector space over R. A norm on V s a functon : V R satsfyng three propertes: (1) v 0 for all v V, wth equalty f and only f v = 0, (2) v + w v + w for all v and w n V, (3) cv = c v for all c R and v V. The same defnton apples to complex vector spaces. From a norm on V we get a metrc on V by d(v, w) = v w. The trangle nequalty for ths metrc s a consequence of the second property of norms. Example 2.1. The standard norm on R n, usng the standard bass e 1,..., e n, s n a e = n a 2. Ths gves rse to the Eucldean metrc on R n : d( a e, b e ) = (a b ) 2. Example 2.2. The sup-norm on R n s n a e = max a. sup Ths gves rse to the sup-metrc on R n : d( a e, b e ) = max a b. Example 2.3. On C n the standard norm and sup-norm are defned smlarly to the case of R n, but we need z 2 nstead of z 2 when z s complex: n a e = n a 2 n, a e = max a. sup 1
2 KEITH CONRAD A common way of placng a norm on a real vector space V s by an nner product, whch s a parng (, ): V V R that s (1) blnear: lnear n each component when the other s fxed. Lnearty n the frst component means (v + v, w) = (v, w) + (v, w) and (cv, w) = c(v, w) for v, v, w V and c R, and smlarly n the second component. (2) symmetrc: (v, w) = (w, v). (3) postve-defnte: (v, v) 0, wth equalty f and only f v = 0. The standard nner product on R n s the dot product: ( n ) n n a e, b e = a b. For an nner product (, ) on V, a norm can be defned by the formula v = (v, v). That ths s actually a norm on V follows from the Cauchy Schwarz nequalty (2.1) (v, w) (v, v)(w, w) = v w as follows. For all v and w n V, v + w 2 = (v + w, v + w) = (v, v) + (v, w) + (w, v) + (w, w) = v 2 + 2(v, w) + w 2 v 2 + 2 (v, w) + w 2 snce a a for all a R v 2 + 2 v w + w 2 by (2.1) = ( v + w ) 2. Takng (postve) square roots of both sdes yelds v + w v + w. A proof of the Cauchy Schwarz nequalty s n the appendx. Usng the standard nner product on R n, the Cauchy Schwarz nequalty assumes ts classcal form, as proven by Cauchy (1821): n a b = n n a 2 b 2. The Cauchy Schwarz nequalty (2.1) s true for every nner product on a real vector space, not just the standard nner product on R n. Whle the norm on R n that comes from the standard nner product s the standard norm, the sup-norm on R n does not arse from an nner product,.e., there s no nner product whose assocated norm s the sup-norm. Even though the sup-norm and the standard norm on R n are not equal, they are each bounded by a constant multple of the other one: (2.2) max a n I=1 a 2 n max a..e., v sup v n v sup for all v R n. Therefore the metrcs these two norms gve rse to determne the same notons of convergence: a sequence n R n that s convergent wth
COMPUTING THE NORM OF A MATRIX 3 respect to one of the metrcs s also convergent wth respect to the other metrc. Also R n s complete wth respect to both of these metrcs. The standard nner product on R n s closely ted to transposton of n n matrces. For A = (a j ) M n (R), let A = (a j ) be ts transpose. Then for all v, w R n, (2.3) (Av, w) = (v, A w), where (, ) s the standard nner product on R n. We now brefly ndcate what nner products are on complex vector spaces. An nner product on a complex vector space V s a parng (, ): V V C that s (1) lnear on the left and conjugate-lnear on the rght: t s addtve n each component wth the other one fxed and (cv, w) = c(v, w) and (v, cw) = c(v, w) for c C. (2) skew-symmetrc: (v, w) = (w, v), (3) postve-defnte: (v, v) 0, wth equalty f and only f v = 0. Physcsts usually defne nner products on complex vector spaces as beng lnear on the rght and conjugate-lnear on the left. It s just a dfference n notaton. The standard nner product on C n s not the dot product, but has a conjugaton n the second component: ( n ) n n (2.4) a e, b e = a b. Ths standard nner product on C n s closely ted to conjugate-transposton of n n complex matrces. For A = (a j ) M n (C), let A = A = (a j ) be ts conjugate-transpose. Then for all v, w C n, (2.5) (Av, w) = (v, A w). An nner product on a complex vector space satsfes the Cauchy Schwarz nequalty, so t can be used to defne a norm just as n the case of nner products on real vector spaces. Although we wll be focusng on norms on fnte-dmensonal spaces, the extenson of these deas to nfnte-dmensonal spaces s qute mportant n both analyss and physcs (quantum mechancs). Exercses. 1. Lettng (, ) m be the dot product on R m and (, ) n be the dot product on R n, show for each m n matrx A, v R n, and w R m that (Av, w) m = (v, A w) n. When m = n ths becomes (2.3). 2. Verfy (2.5). 3. Defnng Norms on Matrces From now on, the norm and nner product on R n and C n are the standard ones. The set of n n real matrces M n (R) forms a real vector space. How should we defne a norm on M n (R)? One dea s to vew M n (R) as R n2 and use the sup-norm on R n2 (a j ) sup = max a j.,j or the standard norm on R n2 : (a j ) = a 2 j. These turn out not to be the best choces. Before ndcatng a better norm on M n (R), let s use the sup-norm on M n (R) to show that
4 KEITH CONRAD each n n matrx changes the (standard) length of vectors n R n by a unformly bounded amount that depends on n and the matrx. For v = (c 1,..., c n ) R n, Av n Av sup by (2.2) = n n max a j c j j=1 n n max a j c j n max j=1 n a j v sup j=1 n n max a j v,j sup n n max a j v by (2.2).,j Let C = n n max a j. Ths s a constant dependng on the dmenson n of the space and the matrx A, but not on v, and Av C v for all v. By lnearty, Av Aw C v w for all v, w R n. Let s wrte down the above calculaton as a lemma. Lemma 3.1. For each A M n (R), there s a C 0 such that Av C v for all v R n. The constant C we wrote down mght not be optmal. Perhaps there s a smaller constant C < C such that Av C v for all v R n. We wll get a norm on M n (R) by assgnng to each A M n (R) the least C 0 such that Av C v for all v R n, where the vector norms n Av and v are the standard ones on R n. Theorem 3.2. For each A M n (R), there s a unque real number b 0 such that () Av b v for all v R n, () b s mnmal: f Av C v for all v R n, then b C. Proof. We frst show by a scalng argument that Av b v for all v R n f and only f Av b for all v R n wth v = 1. The drecton ( ) s clear by usng v = 1. For the drecton ( ), when v = 0 we trvally have Av = 0 = b v. When v 0 let c = v, so c > 0 and the vector v/c has norm 1 ( v/c = 1/c v = (1/ c ) v = (1/ v ) v = 1), so by hypothess A(v/c) b, whch mples (1/ c ) Av b by lnearty. Now multply both sdes by c to get Av bc = b v. Therefore the theorem s sayng the set { Av : v = 1} has a (fnte) maxmum value, whch s b. Ths s what we wll prove. The matrx A as a functon R n R n s contnuous snce the components of Av are lnear functons of the components of v, and hence they are each contnuous n v. The standard norm : R n R s also contnuous snce t s the square root of a polynomal functon of the coordnates. Fnally, snce the unt ball n R n, {v R n : v = 1}, s compact ts contnuous mage { Av : v = 1} n R s also compact. Every compact subset of R contans a maxmum pont, so we are done. Defnton 3.3. For A M n (R), A s the smallest nonnegatve real number satsfyng the nequalty Av A v for all v R n. Ths s called the operator norm of A.
COMPUTING THE NORM OF A MATRIX 5 Theorem 3.2 shows A exsts and s the maxmum of Av when v runs over the unt ball n R n. The next theorem shows the operator norm on M n (R) s a vector space norm and has a host of other nce propertes. Theorem 3.4. For A, B M n (R) and v, w R n, () A 0, wth equalty f and only f A = O. () A + B A + B. () ca = c A for c R. (v) AB A B. It s typcally false that AB = A B. (v) A = A. (v) AA = A A = A 2. Thus A = AA = A A. (v) (Av, w) A v w. (v) A sup A n n A sup and M n (R) s complete wth respect to the metrc comng from the operator norm. Proof. () It s obvous that A 0. If A = 0 then for all v R n we have Av 0 v = 0, so Av = 0. Thus Av = 0 for all v R n, so A = O. The converse s trval. () For all v R n, (A + B)v = Av + Bv Av + Bv A v + B v = ( A + B ) v. Snce A + B s the least C 0 such that (A + B)v C v for all v R n, A + B A + B. () Left to the reader. (v) For all v R n, (AB)v = A(Bv) A Bv A B v, so the mnmalty property of the operator norm mples AB A B. To show that generally AB A B, note that f AB = A B for all A and B n M n (R) then for nonzero A and B we d have AB 0, so AB O. That s, the product of two nonzero matrces s always nonzero. Ths s false when n > 1, snce there are many nonzero matrces whose square s zero. (v) For all v R n, we get by (2.3) that Av 2 = (Av, Av) = (v, A Av) v A Av by Cauchy Schwarz. Ths last expresson s A A v 2, so Av A A v. The least C 0 such that Av C v for all v R n s A, so A A A. Squarng both sdes, (3.1) A 2 A A A A. Dvdng by A when A O, we get A A. Ths s also obvous f A = O, so (3.2) A A for all A M n (R). Now replace A by A n (3.2) to get A (A ) = A,
6 KEITH CONRAD so A = A. (v) Feedng the concluson of (v) back nto (3.1), A 2 A A A A = A 2, so A 2 = A A. Usng A n place of A here, we get A 2 = AA snce A = A. (v) Use Cauchy Schwarz: (Av, w) Av w A v w. (v) Set v = e j and w = e n (v): a j = (Ae j, e ) A. Therefore A sup A. The other nequalty follows from the calculaton leadng up to Lemma 3.1. That M n (R) s complete wth respect to the metrc d(a, B) = A B comng from the operator norm follows from completeness of M n (R) wth respect to the metrc comng from the sup-norm (vew M n (R) as R n2 ) and the fact that these two norms on M n (R) are bounded by constant multples of each other. The operator norm on M n (R) nteracts ncely wth the multplcatve structure on M n (R) and the standard nner product on R n (parts (v) through (v) of Theorem 3.4). However, unlke the standard norm on R n, the operator norm on M n (R) s mpossble to calculate from ts defnton n all but the smplest cases. For nstance, t s clear that I n = 1, so ci n = c for all c R. But what s ( ) 1 2? 3 4 By the last part of Theorem 3.4, ths norm s bounded above by 2 2(4) = 8 2 11.3. In the next secton we wll gve a formula for the operator norm on M n (R) that wll allow us to compute ( 1 3 2 4 ) easly, and t wll turn out to be 5.5. Exercses. 1. Rework ths secton for rectangular matrces that need not be square. For A n M m,n (R), defne ts operator norm A m,n to be the least b 0 such that Av m b v n for all v R n, where the subscrpts n the nequalty ndcate the standard norm on the Eucldean space of the relevant dmenson (m or n). Show A m,n exsts and A m,n = A n,m = AA m,m = A A n,n. You wll want to use the relatonshp between the transpose on M m,n (R) and dot products on R m and R n n Exercse 2.1. 2. Defne an operator norm on M n (C) and establsh an analogue of Theorem 3.4. Is t true that a matrx n M n (C) generally has the same operator norm as ts transpose? For a real n n matrx, show ts operator norm as an element of M n (R) equals ts operator norm as an element of M n (C). 4. A computatonal formula for a matrx norm The key dea to compute the operator norm of A M n (R) s that Theorem 3.4(v) tells us t s also the square root of the operator norm of AA and A A. What makes AA and A A specal s that they are symmetrc (equal to ts own transpose), e.g., (AA ) =
COMPUTING THE NORM OF A MATRIX 7 (A ) A = AA. 1 The followng theorem gves a method to compute operator norms of symmetrc matrces and then general square matrces. Theorem 4.1. (1) If A M n (R) satsfes A = A then all the egenvalues of A are real and (4.1) A = max egenvalues λ of A λ. (2) For all A M n (R), the egenvalues of AA and A A are all nonnegatve. Proof. (1) To prove that when A = A all the egenvalues of A are real, let A act on C n n the obvous way. Usng the standard nner product (2.4) on C n (not the dot product!), we have by (2.5) that for all v C n, (Av, v) = (v, A v) = (v, A v) = (v, A v) = (v, Av). For an egenvalue λ C of A, let v C n be a correspondng egenvector. Then (Av, v) = (λv, v) = λ(v, v), (v, Av) = (v, λv) = λ(v, v), so λ = λ snce (v, v) = v 2 0 (egenvectors are nonzero). Thus λ s real. To relate A to the egenvalues of A as n (4.1), we wll use a fundamental property of real symmetrc matrces that s called the Spectral Theorem. It asserts that every symmetrc matrx A M n (R) has a bass of mutually orthogonal egenvectors n R n. 2 Let v 1,..., v n be a bass of mutually orthogonal egenvectors for A, wth correspondng egenvalues λ 1,..., λ n. What s specal about orthogonal vectors s that ther squared lengths add (the Pythagorean theorem): f (v, w) = 0 then v + w 2 = (v + w, v + w) = (v, v) + 2(v, w) + (w, w) = (v, v) + (w, w) = v 2 + w 2 and lkewse for a sum of more than two mutually orthogonal vectors. Order the egenvalues of A so that λ 1... λ n. For each v R n, wrte t n terms of the bass of egenvectors as v = c 1 v 1 + + c n v n. Then Av = c 1 A(v 1 ) + + c n A(v n ) = c 1 λ 1 v 1 + + c n λ n v n. Snce the v s are mutually perpendcular, ther scalar multples c λ v are mutually perpendcular. Therefore Av 2 = c 1 λ 1 v 1 2 + + c n λ n v n 2 = c 2 1λ 2 1 v 1 2 +, + c 2 nλ 2 n v n 2 c 2 1λ 2 n v 1 2 +, + c 2 nλ 2 n v n 2 snce λ λ n = λ 2 n(c 2 1 v 1 2 +, + c 2 n v n 2 ) = λ 2 n( c 1 v 1 2 + + c n v n 2 ) = λ 2 n c 1 v 1 + + c n v n 2 = λ 2 n v 2, 1 Ths s also true f A Mm,n(R) s a rectangular matrx, whch makes Secton 4 applcable to operator norms of rectangular matrces by Exercse 3.1. 2 The Spectral Theorem ncludes the asserton that all egenvalues of A are real, whch we showed above.
8 KEITH CONRAD so Av λ n v. Snce ths nequalty holds for all v n R n, we have A λ n. To prove A = λ n t now suffces to fnd a sngle nonzero vector v such that Av = λ n v. For that we can use v = v n snce Av n = λ n v n. (2) Snce AA and A A are both symmetrc, all ther egenvalues are real. Let λ R be an egenvalue of AA wth correspondng egenvector v R n, and let µ R be an egenvalue of A A wth correspondng egenvector w R n. Usng the standard nner product on R n, 0 (Aw, Aw) = (w, A Aw) = (w, µw) = µ(w, w). Then µ 0 snce (w, w) = w 2 > 0. Smlarly, 0 (A v, A v) = (v, AA v) = (v, λv) = λ(v, v). Snce (v, v) = v 2 > 0, t follows that λ 0. Corollary 4.2. For A M n (R), A s the square root of the largest egenvalue of AA and s the square root of the largest egenvalue of A A. Proof. By Theorem 3.4(v), A = AA = A A. Now use (4.1), wth AA and A A n place of A. Remark 4.3. Ths corollary, wthout proof, goes back to Peano [2, p. 454] usng A A. On the same page Peano ntroduced the operator norm on M n (R) from Defnton 3.3 and proved Theorem 3.4() and (v). In the same year (1888) Peano [3] ntroduced the frst axomatc treatment of real vector spaces (whch he called lnear systems ) of arbtrary dmenson and lnear operators on them; t was ahead of ts tme and largely forgotten, ncludng by Peano hmself. The man nspraton for the development of abstract lnear algebra came from work on normed vector spaces by Banach n the 1920s [1]. Example 4.4. Let s compute the operator norm of the 2 2 matrx ( ) 1 2 A =. 3 4 Snce AA = ( 5 11 11 25 the characterstc polynomal of AA s X 2 30X + 4, whch has egenvalues 15 ± 221.13, 29.86. Therefore the operator norm of A s 15 + 221, so for all ( x y) R 2, ( ) 5x + 11y 15 + ( ) 221 x 11x + 25y, y and 15 + 221 5.46 s the smallest number wth that property. Computng the operator norm of A amounts to fndng the largest egenvalue of a related symmetrc matrx (AA or A A). In practce, for large symmetrc matrces the largest egenvalue s not computed by calculatng the roots of ts characterstc polynomal. More effcent algorthms for calculatng egenvalues are avalable (e.g., QR algorthm or Lanczos algorthm). ),
COMPUTING THE NORM OF A MATRIX 9 Exercses. 1. For A M m,n (R), show the egenvalues of the m m matrx AA and the n n matrx A A are nonnegatve. (The nonzero egenvalues of these matrces are also equal. For all A M m,n (R) and B M n,m (R), the matrces AB M m (R) and BA M n (R) have the same nonzero egenvalues.) 2. If you worked out propertes of operator norms of rectangular matrces n Exercse 3.1, determne the operator norm of the 2 3 matrx ( 1 2 3 4 5 6 3. If we use a norm on R n other than the standard norm, the correspondng operator norm on M n (R) wll be dfferent from the one we have worked out here. When R n has a norm, the related operator norm of a matrx A M n (R) s the least b 0 such that Av b v for all v R n. Let s work out an example. Gve R n the sup-norm, so the assocated operator norm of an n n matrx A s the least b satsfyng Av sup b v sup for all v R n. What s b? (a) For all v R n, show Av sup b v sup where b = max 1 n n j=1 a j. (b) For each row (a 1,..., a n ) of A, show there s a vector v n R n wth coordnates from {±1} such that the th entry of Av equals a 1 + + a n. Conclude that there s a v R n such that Av sup = b v sup where b s the number n part (a). Therefore the operator norm of A when usng the sup-norm on R n s b. 4. Generalze Theorem 4.1 to gve a computatonal formula for the operator norm of matrces n M n (C). ). Appendx A. Proof of Cauchy-Schwarz nequalty We gve a proof of the Cauchy Schwarz nequalty that was found by Schwarz [4, p. 344] n 1888. It s a clever trck wth quadratc polynomals and the context n whch Schwarz dscovered t s descrbed n [5, pp. 10 11]. (The whole book [5] s recommended as a lvely account of fundamental nequaltes n mathematcs.) Let V be a real vector space wth an nner product (, ). Pck v and w n V. Our goal s to show (v, w) v w. Ths s obvous f v or w s 0, so assume both are nonzero. For all t R, (v +tw, v+tw) 0. The left sde can be expanded to be a quadratc polynomal n t: (v + tw, v + tw) = (v, v) + (v, tw) + (tw, v) + (tw, tw) = (v, v) + t(v, w) + t(w, v) + t 2 (w, w) = v + 2(v, w)t + w t 2. Ths s quadratc snce w > 0. A quadratc polynomal n t has nonnegatve values for all t f and only f ts dscrmnant s 0, so (2(v, w)) 2 4 v w 0, whch s equvalent to (v, w) v w, and that completes the proof! Remark A.1. We have equalty (v, w) = v w f and only f the quadratc polynomal above has a double real root t, and at that root we get (v+tw, v+tw) = 0, so v+tw = 0 n V
10 KEITH CONRAD and thus v and w are lnearly ndependent. The converse drecton, that lnear dependence mples equalty n the Cauchy Schwarz nequalty, s left to the reader (also n the case that v or w s 0). References [1] G. H. Moore, The Axomatzaton of Lnear Algebra: 1875 1940, Hstora Mathematca 22 (1995), 262 303. Onlne at https://core.ac.uk/download/pdf/82128888.pdf. [2] G. Peano, Intégraton par séres des équatons dfférentelles lnéares, Mathematsche Annalen 32 (1888), 450 456. [3] G. Peano, Calcolo Geometrco secondo l Ausdehnungslehre d H. Grassmann, preceduto dalle operazon della logca deduttva, Fratell Bocca, Turn, 1888. Onlne at http://mathematca.sns.t/opere/138/. [4] H. A. Schwarz, Über en de Flächen klensten Flächennhalts betreffendes Problem der Varatonsrechnung, Acta Soc. Scent. Fenn. 15 (1888), 315 362. [5] J. M. Steele, The Cauchy Schwarz Master Class: An Introducton to the Art of Mathematcal Inequaltes, Cambrdge Unv. Press, Cambrdge, 2004.