Chater Matrix Norms and Singular Value Decomosition Introduction In this lecture, we introduce the notion of a norm for matrices The singular value de

Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A Dahleh George Verghese Deartment of Electrical Engineering and Comuter Science Massachuasetts Institute of Technology c

Chater Matrix Norms and Singular Value Decomosition Introduction In this lecture, we introduce the notion of a norm for matrices The singular value decomosition or SVD of a matrix is then resented The SVD exoses the -norm of a matrix, but its value to us goes much further: it enables the solution of a class of matrix erturbation roblems that form the basis for the stability robustness concets introduced later it solves the so-called total least squares roblem, which is a generalization of the least squares estimation roblem considered earlier and it allows us to clarify the notion of conditioning, in the context of matrix inversion These alications of the SVD are resented at greater length in the next lecture Examle To r o vide some immediate motivation for the study and alication of matrix norms, we begin with an examle that clearly brings out the issue of matrix conditioning with resect to inversion The question of interest is how sensitive the inverse of a matrix is to erturbations of the matrix Consider inverting the matrix A = () : A q u i c k calculation that shows A ; = ; ; : () Now suose we i n vert the erturbed matrix A + A = () :

The result now i s (A + A) ; = A ; + ( A ; ) = ; : ; () Here A denotes the erturbation in A and (A ; ) denotes the resulting e r - turbation in A ; Evidently a % change in one entry of A has resulted in a % change in the entries of A ; If we w ant to solve the roblem Ax = b where b = [ ; ] T, then x = A ; b = [; :] T, while after erturbation of A we get x + x = ( A + A) ; b = [ ; :] T Again, we see a % change in the entries of the solution with only a % change in the starting data The situation seen in the above examle is much w orse than what can ever arise in the scalar case If a is a scalar, then d(a ; )=(a ; ) = ; da=a, so the fractional change in the inverse of a has the same maginitude as the fractional change in a itself What is seen in the above examle, therefore, is a urely matrix henomenon It would seem to b e related to the fact that A is nearly singular in the sense that its columns are nearly deendent, its determinant is much smaller than its largest element, and so on In what follows (see next lecture), we shall develo a sound way to measure nearness to singularity, and show h o w this measure relates to sensitivity u n d e r i n version Before understanding such sensitivity to erturbations in more detail, we need ways to measure the \magnitudes" of vectors and matrices We have already introduced the notion of vector norms in Lecture, so we n o w turn to the denition of matrix norms Matrix Norms An m n comlex matrix may b e v i e w ed as an oerator on the (nite dimensional) normed vector sace C n : A mn : ( C n k k ) ;! ( C m k k ) () where the norm here is taken to be the standard Euclidean norm Dene the induced -norm of A as follows: k Axk k Ak = su x= k xk () = max k Axk kxk = : () The term \induced" refers to the fact that the denition of a norm for vectors such as Ax and x is what enables the above denition of a matrix norm From this denition, it follows that the induced norm measures the amount of \amlication" the matrix A rovides to vectors on the unit shere in C n, ie it measures the \gain" of the matrix Rather than measuring the vectors x and Ax using the -norm, we could use any -norm, the interesting cases being = Our notation for this is k Ak = max k Axk : (8) kxk =

The triangle inequality holds since: An imortant question to consider is whether or not the induced norm is actually a norm, in the sense dened for vectors in Lecture Recall the three conditions that dene a norm: k xk, and k xk = () x = k xk = j j k xk k x + yk k xk + k yk Now let us verify that k Ak is a norm on C mn using the receding denition: k Ak since k Axk for any x Futhermore, k Ak = () A =, since k Ak is calculated from the maximum of k Axk evaluated on the unit shere k Ak = j j k Ak follows from k yk = j j k yk (for any y) k A + Bk = max k (A + B)xk kxk = max k Axk + k Bx k kxk = k Ak + k Bk : Induced norms have t wo additional roerties that are very imortant: k Axk k Ak k xk, w h i c h is a direct consequence of the denition of an induced norm For A mn, B nr, k ABk k Ak k Bk (9) which is called the submultilicative roerty This also follows directly from the denition: k ABxk k Ak k Bx k k Ak k Bk k xk for any x: Dividing by k xk : from which the result follows k ABxk k xk k Ak k Bk Before we turn to a more detailed study of ideas surrounding the induced -norm, which will be the focus of this lecture and the next, we m a k e some remarks about the other induced norms of ractical interest, namely the induced -norm and induced -norm We shall also

say something about an imortant matrix norm that is not an induced norm, namely the Frobenius norm and It is a fairly simle exercise to rove that mx kak = max ja ij j (max of absolute column sums of A) () jn i= nx kak = max ja ij j (max of absolute row sum s of A) : () im j= (Note that these denitions reduce to the familiar ones for the -norm and -norm of column vectors in the case n = ) The roof for the induced -norm involves two stages, namely: Prove that the quantity in Equation () rovides an uer bound : kaxk kxk 8x Show that this bound is achievable for some x = x^: kax^k = kx^k for some ^x : In order to show how these stes can b e imlemented, we give the details for the -norm case Let x C n and consider nx kaxk = max j a ij x j j im j= nx max ja ij jjx j j im j= nx @ max ja ij ja max jx j j im j= jn nx = @ max ja ij ja kxk im j= The above inequalities show that an uer bound is given by nx max kaxk = max ja ij j: kxk = im j=

@ X X A Now in order to show t h a t this uer bound is achieved by som e vector x^, let i b e an index Pn at which the exression of achieves a maximum, that is = j= jaij j Dene the vector x ^ as sgn(ai) sgn(ai) x^ = : sgn(ain) Clearly kx^k = and X n kax^k = jaij j = : j= The roof for the -norm roceeds in exactly the same way, and is left to the reader There are matrix norms ie functions that satisfy the three dening conditions stated earlier that are not induced norms The most imortant examle of this for us is the Frobenius norm: n m kak F = ja ij j () = j= i= trace(a ; A) (verify) () In other words, the Frobenius norm is dened as the root sum of squares of the entries, ie the usual Euclidean -norm of the matrix when it is regarded simly as a vector in C mn Although it can be shown that it is not an induced matrix norm, the Frobenius norm still has the submultilicative roerty that was noted for induced norms Yet other matrix norms may b e dened (some of them without the submultilicative roerty), but the ones above are the only ones of interest to us Singular Value Decomosition Before we discuss the singular value decomosition of matrices, we begin with some matrix facts and denitions Some Matrix Facts: A matrix U C nn is unitary if U U = UU = I Here, as in Matlab, the suerscrit denotes the (entry-by-entry) comlex conjugate of the transose, which is also called the Hermitian transose or conjugate transose A matrix U R nn is orthogonal if U T U = UU T = I, where the suerscrit the transose T denotes Proerty: If U is unitary, then kux k = kxk

; " # If S = S (ie S equals its Hermitian transose, in which case we s a y S is Hermitian), then there exists a unitary matrix such t h a t U SU = [diagonal matrix] For any matrix A, both A A and AA are Hermitian, and thus can always be diagonalized by unitary matrices For any matrix A, the eigenvalues of A A and AA are always real and non-negative (roved easily by contradiction) Theorem (Singular Value Decomosition, or SVD) Given any matrix A C mn, A can be written as mm mn A = nn U V () where U U = I, V V = I, = () r and i = ith nonzero eigenvalue of A The A i are termed the singular values of A, and ie, are arranged in order of descending magnitude, r > : Proof: We will rove this theorem for the case rank(a) = m the general case involves very little more than what is required for this case The matrix AA is Hermitian, and it can therefore be diagonalized by a unitary matrix U C mm, so that U U = AA : Note that = diag( : : : m) has real ositive diagonal entries i due to the fact that AA is ositive denite We can write = = diag( : : : m) Dene V C mn by V = ; U A V has orthonormal rows as can b e seen from the following calculation: V V = AA U ; = I Choose the matrix V in such a w ay that U V = V V is in C nn and unitary Dene the m n matrix = [ j] This imlies that In other words we h a ve A = U V V = V = U A: One cannot always diagonalize an arbitrary matrix cf the Jordan form

= = Examle For the matrix A given at the beginning of this lecture, the SVD comuted easily in Matlab by writing [u s v] = svd(a) is Observations: A = :8 : : : ; 8 : : : ;:8 :8 : () i) which tells us U diagonalizes AA AA = U V V T U U T U = U r U () ii) A A = V T U U V V T V = V r V (8) which tells us V diagonalizes A A iii) If U and V are exressed in terms of their columns, ie, h i U = u u u m and then h i V = v v v n X r A = i u i v i= i (9)

= P which is another way to write the SVD The u i are termed the left singular vectors of A, and the v i are its right singular vectors From this we see that we can alternately interret Ax as X r ; Ax = i u i v () i x {z } i= rojection which is a weighted sum of the u i, where the weights are the roducts of the singular values and the rojections of x onto the v i Observation (iii) tells us that R a(a) = san f u : : : u r g (because Ax = i= c i u i where the c i are scalar weights) Since the columns of U are indeendent, dim R a(a) = r = rank ( A), and f u : : : u r g constitute a basis for the range sace of A The null sace of A is given by sanf v r+ : : : v ng To see this: U V x = () V x = () v x r v r x () v i x = i = : : : r () x sanf v r+ : : : v ng : Examle One alication of singular value decomosition is to the solution of a system of algebraic equations Suose A is an m n comlex matrix and b is a vector in C m Assume that the rank of A is equal to k, with k < m We are looking for a solution of the linear system Ax = b By alying the singular value decomosition rocedure to A, we get r A = = U U V V where is a k k non-singular diagonal matrix We will exress the unitary matrices U and V columnwise as h U = u u : : : u m h V = v v : : : v n : A necessary and sucient condition for the solvability of this system of equations is that u i b = for all i satisfying k < i m Otherwise, the system of equations is inconsistent This condition means that the vector b must be orthogonal to the i i

v = : last m ; k columns of U Therefore the system of linear equations can be written as V x = U b V x = b u b u u b k u b m u b Using the above equation and the invertibility o f, w e can rewrite the system of equations as v u b v x = u b : : : v u By using the fact that k k k b v h i v k v v : : : v k = I we obtain a solution of the form kx x = u i= i ib v i: From the observations that were made earlier, we know that the vectors v k+ v k+ : : : v n san the kernel of A, and therefore a general solution of the system of linear equations is given by kx x = (u b) + nx i v i i= i i=k+ i v i where the coecients i, with i in the interval k + i n, are arbitrary comlex numbers

Relationshi to Matrix Norms The singular value decomosition can be used to comute the induced -norm of a matrix A Theorem kaxk kak = su x= kxk = () = max (A) which tells us that the maximum amlication is given by the maximum singular value Proof: kaxk ku V xk su = su = kxk x x = kxk kv xk = su x= kxk kyk = su y = kv y k P r i= i jy i j = su P y= r jy i j i= : For y = [ ] T, kyk =, and the suremum is attained (Notice that this correonds to x = v Hence, Av = u ) Another alication of the singular value decomosition is in comuting the minimal amlication a full rank matrix exerts on elements with -norm equal to Theorem Given A C mn, suose rank(a) = n Then min kaxk = n (A) : () kxk = Note that if rank(a) < n, then there is an x such that the minimum is zero (rewrite A in terms of its SVD to see this) Proof: For any kxk =, xk kaxk = ku V = kv xk (invariant u n d e r m ultilication by unitary matrices) = kyk

It follows that for any matrix A, not necessarily square, v x A A v A v v Figure : Grahical deiction of the maing involving A Note that Av = u and that Av = u for y = V x Now! X n kyk = j i y i j i= n : T Note that the minimum is achieved for y = [ ] t h us the roof is comlete The Frobenius norm can also be exressed quite simly in terms of the singular values We l e a ve you to verify that kak F Examle Matrix Inequality Xn X m = @ ja ij j A j= i= ; = trace(a A)! rx = i i= () We say A B, t wo square matrices, if x Ax x B x for all x = : kak $ A A I:

Show that one way to do this is via Householder transformations, as follows: Exercises Exercise Verify that for any A, an m n matrix, the following holds: kak k Ak mkak : n Exercise Suose A = A Find the exact relation between the eigenvalues and singular values of A Does this hold if A is not conjugate symmetric? Exercise Show that if rank(a) =, then, kak F = kak Exercise This roblem leads you through the argument for the existence of the SVD, using an iterative construction Showing that A = U V, w here U and V are unitary matrices is equivalent t o showing that U AV = a) Argue from the denition of kak that there exist unit vectors (measured in the -norm) x C m and y C such that Ax = y, where = kak b) We can extend b o th x and y above to orthonormal bases, ie we can nd unitary matrices V and U whose rst columns are x and y resectively: ~ V = [ x V~ ] U = [ y U ] n and likewise for U hh V = I ; h = x ; [ : : : ] h h c) Now dene A = U AV Why is ka k = kak? d) Note that A = y Ax y AV~ w = U~ Ax U~ AV~ B What is the justication for claiming that the lower left element in the above matrix is? e) Now show that ka w k + w w and combine this with the fact that ka k = kak = to deduce that w =, so A = B

At the next iteration, we aly the above rocedure to B, and so on When the iterations terminate, we h a ve the SVD [The reason that this is only an existence roof and not an algorithm is that it begins by i n voking the existence of x and y, but does not show h o w to comute them Very good algorithms do exist for comuting the SVD see Golub and Van Loan's classic, Matrix Comutations, Johns Hokins Press, 989 The SVD is a cornerstone of numerical comutations in a host of alications] Exercise Suose the m n matrix A is decomosed in the form A = U where U and V are unitary matrices, and is an invertible r r matrix ( the SVD could be used to roduce such a decomosition) Then the \Moore-Penrose inverse", or seudo-inverse of A, denoted by A +, can be dened as the n m matrix (You can invoke it in Matlab with inv(a)) A + ; = V a) Show that A + A and AA + are symmetric, and that AA + A = A and A + AA + = A + (These four conditions actually constitute an alternative denition of the seudo-inverse) b) Show that when A has full column rank then A + = (A A) ; A, and that when A has full row rank then A + = A (AA ) ; c) Show that, of all x that minimize ky ; Axk (and there will b e many, if A does not have full column rank), the one with smallest length kxk is given by ^x = A + y V U Exercise All the matrices in this roblem are real Suose A = Q with Q being an m m orthogonal matrix and R an n n invertible matrix (Recall that such a decomosition exists for any matrix A that has full column rank) Also let Y be an m matrix of the form Y = Q where the artitioning in the exression for Y is conformable with the artitioning for A R Y Y

To simlify the comutation, minimize the Frobenius norm of in the dention of (A) ^ (a) What choice X of the n matrix X minimizes the Frobenius norm, or equivalently the squared Frobenius norm, of Y ; AX? In other words, nd ^ X = argmin ky ; AXk F Also determine the value of ky ; AXk ^ F (Your answers should b e exressed in terms of the matrices Q, R, Y and Y ) ^ (b) Can your X in (a) also be written as (A A) ; A Y? C an it be w ritten as A + Y, where A + denotes the (Moore-Penrose) seudo-inverse of A? (c) Now obtain an exression for the choice X of X that minimizes ky ; AXk F + kz ; BX k F where Z and B are given matrices of aroriate dimensions (Your answer can be exressed in terms of A, B, Y, and Z) Exercise Structured Singular Values Given a comlex square matrix A, dene the structured singular value function as follows where is some set of matrices (A) = min f max () j det(i ; A) = g a) = fi : C g (A (A of A as: (A) = m a x j i j and the i 's are the eigenvalues of A If, show that ) = ), where is the sectral radius, dened b) If = f C i nn g, show that (A) = max (A) c) If = fdiag( n) j i C g, show that where (A) (A) = (D ; AD) max (D ; AD) D f diag(d d n) j d i > g Exercise 8 Consider again the structured singular value function of a comlex square matrix A dened in the receding roblem If A has more structure, it is sometimes ossible to comute (A) exactly In this roblem, assume A is a rank-one matrix, so that we can write A = uv where u v are comlex vectors of dimension n Comute (A) w hen (a) = diag( : : : n) i C (b) = diag( : : : n) i R