(A and B must have the same dmensons to be able to add them together.) Addton s commutatve and assocatve, just lke regular addton. A matrx A multpled

CNS 185: A Bref Revew of Lnear Algebra An understandng of lnear algebra s crtcal as a steppng-o pont for understandng neural networks. Ths handout ncludes basc dentons, then quckly progresses to elementary but powerful technques such as egenbases. For your prvate edcaton, a few exercses are ncluded, dented by bullets ; some exercses come wth hnts and answers. Take what you wll from ths handout, but be forewarned that future problem sets wll requre most of the concepts developed here, so t behooves you to be comfortable wth them. Don't delay n askng a TA fyou can't gure t out on your own. 1 STARTING DEFINITIONS 1.1 Matrx structure Matrces are most often represented as rectangular arrays of scalars. 1 The m n matrx A has m rows and n columns. The subscrpt notaton A s used to reference the th row of the matrx, and A j s used to reference the scalar n the th row and jth column of A. For example, A s a 23 matrx: A = 4 2,5 1 0,8 ; A13 =,5 A column vector (qute often referred to smply as a vector) s an n 1 matrx, where n s referred to as the dmenson of the vector. The scalar v s the th element ofvector v. A matrx wth the same number of rows and columns s, not surprsngly, referred to as a square matrx. A commonly used notatonal conventon s to use captal letters (.e., A) to denote matrces and lower case letters (.e., v) to denote vectors. 1.2 Matrx transpose The transpose of m n matrx A s denoted A T. A T s an n m matrx whose elements are: (A T ) j = A j The transpose of a column vector s called a row vector. An object twce transposed wll produce the orgnal object: (A T ) T = A. 1.3 Addton and multplcaton Addng two matrces A and B results n a matrx whose elements are the sums of the correspondng elements from A and B: If C = A + B; then C j = A j + B j 1 In the examples presented, scalars wll be real numbers, but n general they can be complex.

(A and B must have the same dmensons to be able to add them together.) Addton s commutatve and assocatve, just lke regular addton. A matrx A multpled by a scalar k produces a new B = ka whose elements are the elements of A each multpled by k. Multplyng two matrces together s more complcated: multplyng m n matrx A by n p matrx B produces an m p matrx C = AB whose elements are dened to be: nx If C = AB; then C k = A j B jk j=1 In ths example, a 4 2 matrx s multpled by a2 3 matrx to produce a 4 3 matrx: 2 6 4 2 3 4 0 3,2 5 1 3 7 5 1 2 0 5,1 3 = 2 6 4 17 1 9 4 8 0,7 8,6 10 9 3 Note that matrx multplcaton can only be performed between two matrces A and B f A has exactly as many columns as B has rows. Lke ordnary multplcaton, matrx multplcaton s assocatve and dstrbutve, but unlke ordnary multplcaton, t s not commutatve: AB 6= BA, n general From the dentons of multplcaton and transpose, we derve the followng dentty: 3 7 5 (AB) T = B T A T 1.4 Inner product The nner product (also known as the dot product) of n-dmensonal vectors x and y s dened as x T y whch s a scalar 2. By our dentons of matrx transpose and matrx multplcaton, ths means that the nner product s the sum of the products of correspondng elements from the two vectors: nx x T y = x y =1 If the nner product of two vectors s zero, they are sad to be orthogonal, whch has the usual geometrc connotaton of perpendcularty. 1.5 Square matrces The dagonal of an n n square matrx A are the elements A runnng dagonally from the top left corner to the bottom rght. A dagonal matrx s a matrx whch has zeroes everywhere o the dagonal. 2 When workng wth complex vectors, we use the nner product x y, whch returns a real value when y = x. x s the complex conjugate of the transpose of x. The complex conjugate x has Re[x ]=Re[x ] and Im[x ]=,Im[x ]. 2

The symbol I s reserved for a partcular dagonal matrx known as the dentty matrx, whch has ones along ts dagonal and zeroes elsewhere. It s the multplcatve dentty for matrx multplcaton of square matrces. In other words, gven any n n square matrx A, t has the followng property: AI = IA = A. The n n square matrx A s called nvertble f there exsts a matrx denoted A,1 whch statses: AA,1 = A,1 A = I If A,1 exsts, t s called the nverse matrx. If A,1 does not exst, A s called a sngular matrx. The nverse of the nverse matrx s smply the orgnal matrx. Show that (A T ),1 =(A,1 ) T Show that (AB),1 = B,1 A,1 All square matrces have a partcular scalar value assocated wth them, known as the determnant, whch s wrtten as det A = jaj For two dmensons, the formula for calculatng the determnant s smple: A 11 A21 A12 A22 = A 11A22, A12A21 In general, the algebrac formula for determnants s more complcated, but there s a smple and very useful recursve denton (whch you can look up n any lnear algebra book). For your amusement, the algebrac formula for the determnant of an n n matrx can be summarzed as jaj = X sgn()a11 A 22 :::A nn where the sum s over all n! permutatons of (1 :::n), and sgn() s+1f s an even permutaton, else,1 f s an odd permutaton 3. Happly, determnants can be qute useful even wthout calculatng them. A few facts about determnants nclude jabj = jaj jbj, and that f the determnant of a matrx s zero, the matrx s sngular, whch we'll use below. 2 EIGENVALUES AND EIGENVECTORS A smple way to nd f a matrx A s nvertble or not s to nd ts determnant, snce det A = 0 f and only f A s not nvertble If A s nvertble, the only x satsfyng Ax = 0 s x = 0 (why?). On the other hand, f A s not nvertble, there can be nterestng non-zero solutons for x. 3 Even and odd refer to how many swaps of adjacent elements are requred to transform (1 :::n)nto. For n =3, the even permutatons are f(123); (231); (3 1 2)g. 3

Fnd a condton for whch the equaton Ax = x ( a scalar) has nterestng non-zero solutons for x. Use ths condton to wrte an equaton that must satsfy n order to get non-zero solutons to the followng equaton: 5,1 (1),2 4 x1 x2 = x1 x2 Solve the equaton you got for. You wll get two possble values. Usng one of them, nd values for x1 and x2 that satsfy equaton (1). Note that f x satses Ax = x, sodoesx. So you won't be able to solve for a unque x1 and x2, just for the drecton that x should le n. That drecton s called the egenvector drecton, and all vectors parallel to t are egenvectors wth the same egenvalue. Now use the other to nd the other egenvector drecton. To restate, f x 6= 0 and Ax = x then s an egenvalue of A and x s an egenvector of A wth egenvalue. For larger square matrces, we can nd egenvalues and egenvectors usng the same approach you used for the 22 matrx. Frst, we look for values of such that A, I s sngular,.e., ja, Ij =0. Usng the formula for determnants, ths leads to a polynomal of degree n, whch s called the characterstc polynomal of A. You wll recall from algebra that every polynomal of degree n has exactly n (not necessarly dstnct) complex roots (some of whch may be real, of course). Therefore, every matrx A has exactly n (not necessarly dstnct and possbly complex) egenvalues. Once the egenvalues are known, the egenvectors can be determned. Fnd a matrx A for whch 0 s an egenvalue, and nd all the egenvectors. A common conventon s to chose egenvectors to be unt vectors 4,.e. x T x =1. 2.1 Some words about egenvalues If you wrte the egenvector drectons as column vectors and put them sde by sde, you get a new matrx call t E. Convnce yourself that snce the columns of E are egenvectors, the followng s true: 1 0 AE = E 0 2 where 1 s the rst column's egenvalue, and 2 s the second column's egenvalue We can multply on the rght by E,1 (assumng E s nvertble) to get 1 0 A = E E 0 2,1 Ths expresson helps us descrbe the true sgncance of the matrx A, and why egenvalues are so mportant. 4 If the egenvectors are complex, there s no obvous way to nd a unque representaton for the egenvector by normalzng, snce an egenvector multpled by any complex number s stll an egenvector, wth the same egenvalue. For convenence one can set x x =1. 4

Multplyng on the left by A s the same as performng the followng sequence of operatons: 1. rst multplyng by E,1. Thnk of ths as dong a lnear change of coordnates. That s, we change coordnates to some specal coordnate system. 1 0 2. In that coordnate system, multply by. But ths s a partcularly easy matrx to 0 2 multply wth, snce the coordnates don't mx! We can easly vsualze what s gong on: the rst coordnate gets stretched (or squeezed) by 1, the second by 2. 3. Then go back toyour orgnal coordnates, by multplyng by the nverse of E,1, namely E. That s, there s some specal coordnate system n whch multplyng by A just stretches the two coordnates ndependently. Clearly ths s the natural coordnate system for the problem, the one we want to be thnkng n. Many tmes t s enough to know the egenvalues: we know we could always transform the problem nto the specal system f we wanted to. We just pretend that we've already done the transformaton. Ths llustrates a very mportant concept that cannot be stressed enough: the real guts of a matrx, what t really does, don't depend on what coordnate system we use to descrbe t. Here, f A s a postve dente matrx, then E s an orthonormal matrx and represents smply a rotaton. 5 Who cares f we rotate coordnates around? They're our coordnates, not the physcal problem's. The egenvalues are what really matter. Fnd the egenvalues and egenvectors of the followng two matrces: 1 3 3 1 and 2 4 10 2 2 2 22,5 2,5 22 3 5 The arguments we used above all rely on our casual assumpton that E s nvertble. Ths s usually an acceptable assumpton to make, for the sorts of matrces commonly encountered n neural network theory. But t can be helpful to have some understandng of the other possbltes. 6 2.2 Some more useful facts about egenvectors In the followng, assume when necessary that E s nvertble, wth egenvectors e1 :::e n and correspondng egenvalues 1 ::: n. If C has egenvectors and egenvalues fe ; g, then the matrx B = C, I has egenvectors and egenvalues fe ;, g. If C s a real symmetrc matrx, (.e. C j = C j ), then all the egenvalues of C are real. We can also choose all the egenvectors to be real. 7 5 Postve dente matrx: a matrx M such that x T Mx > 0 for all non-zero x. Orthonormal matrx: One where M T M = I. The mportant pont s that f these condtons are satsed, the matrx E s just a coordnate rotaton and/or a reecton. (Though reectons aren't properly rotatons, we almost always nclude them when we say, abusng notaton, \rotaton matrx". 6 A quck summary: Suppose the n (possbly complex) egenvalues of M are dstnct. Then to each egenvalue there s a unque (up to multplcaton by a complex number) egenvector, and all the egenvectors are lnearly ndependent (.e. they span C n ). Now suppose there are m egenvalues wth the same value. In ths case, unfortunately, there mght not be m lnearly ndependent egenvectors all wth the same egenvalue, n whch case E must be sngular. In both these cases, there s nothng specal about the egenvalue 0 { the ssue s only whether an egenvalue s a multple root of the characterstc polynomal. 7 Hnt: Start wth a potentally complex egenvalue and ts potentally complex egenvector x, satsfyng Cx = x. 5

If C s a real symmetrc matrx, any two egenvectors of C wth derent egenvalues are orthogonal. 8 Let be x a random vector, and dene the cross correlaton matrx: C j = hx x j that s, the jth component ofc s the expected value (.e. mean value) of the product x x j. C s symmetrc, and all ts egenvalues are postve,.e., 0. 9 Thnk about the vector dened by the row of ntenstes n one horzontal lne on a televson screen. What do you thnk the cross correlaton matrx C of that vector looks lke, averaged over many pctures? What do you thnk the prncpal egenvector of C looks lke? (The prncpal egenvector s the one wth largest egenvalue.) What about the next few prncpal components? 10 We can express a vector w n terms of the complete orthonormal set of egenvectors e : w = X! e where! = w T e. The! are the components of w n the egenvector bass. If Cw = b, we can wrte an explct soluton for w = C,1 b n terms of b and fe ; g (assumng all > 0): C,1 X (e T b) = e 3 LINEAR DIFFERENTIAL EQUATIONS The smplest lnear derental equaton s the one varable equaton where s a scalar. dx dt = x Conrm for yourself that ts soluton s an exponental, x(t) =x(0)e t (2) where x(0) s the ntal condton. Clearly, x = 0 s a xed pont of ths equaton snce then _x =0, that s, x stays put. 11 We can ask about the stablty of ths xed pont: f we were to add a lttle By conventon, we'vechosen a unt egenvector (why can we always do ths?) so x x = 1. Combne these two equatons and C = C to show that =. Wenow need to show that x s real; look at what C does to the real and magnary components of x ndependently. 8 Hnt: f x 1 and x 2 are egenvectors wth egenvalues 1 and 2, use the denton of an egenvector to show that x T 1 x 2 = 1 2 xt 1 x 2, or somethng smlar. 9 Hnt: to prove that all 0, t s sucent to show that y T Cy 0 for any vector y. 10 Hnt 1: The cross correlaton matrx has to be all postve, f the pxel ntenstes are postve. It s plausble that the correlaton between two pxels should just be a functon of ther separaton, so that the matrx should have a banded symmetrcal appearance. (A matrx for whch C j = c(,j) s called a Toepltz matrx.) The correlaton must be bggest on the dagonal because a pxel s most correlated wth tself. Away from the dagonal, the correlatons must get smaller, but not necessarly monotoncally. Hnt 2: The prncpal egenvector of an all{postve matrx must be an all{postve vector. Imagne multplyng a not qute all{postve vector repeatedly by C, and thnk what happens to t; repeated multplcaton yelds a vector lookng more and more lke the prncpal egenvector. Thnk of a monotonc Toepltz C and you should conrm that ths vector must end up lookng lke a symmetrcal hump. Hnt 3: What s the next thng to havng no changes of sgn n a vector? 11 _x s a notatonal varant of dx dt and x s a notatonal varant of d2 x dt 2. 6

dsturbance to x, would x return to the xed pont, or would t shoot o n some drecton? The stablty of xed ponts s of great practcal mportance n a world full of natural small random dsturbances. For example, the bottom of a sphercal bowl s a stable xed pont: frut stays down there. But the top of a glass sphere s an unstable xed pont: we could very carefully balance an apple on top of t { but any small dsturbance, and the apple wll fall o. For dx dt = x, convnce yourself that x = 0 s a stable xed pont f<0 and s unstable f >0. The phrasng and soluton to the above problem assume s real. What f s complex? Convnce yourself that equaton 2 stll holds. The magnary part of just represents an oscllaton (e wt = cos wt + sn wt). So the condton above, to be completely general, should really read \stable xed pont f the real part of <0, unstable f the real part of >0". What happens f the real part of = 0 exactly? Now consder the followng equaton: x =, _x,!x One of the nce thngs about lnear derental equatons s that we can always take a sngle n-th order equaton and turn t nto n coupled rst-order equatons by rewrtng some of the varables. So we dene x1 _x, x2 x, to get the equvalent equatons x1 _,,! x1 = (3) x2 _ 1 0 x2 Convnce yourself that these equatons ndeed represent the same system. You wll have notced that we already wrote ths down n matrx form. We wll now get a chance to use what we saw n secton 2. Call the vector on the left _x, the matrx on the rght hand sde A, and the vector on the rght hand sde x, so the equaton s _x = Ax. We sad that we can nd specal coordnates where our matrx doesn't mx coordnates (that s, t s a dagonal matrx). Suppose that we nd matrces E and, where s a dagonal matrx that holds the egenvalues, such that A = EE,1, as n secton 2.1. Then _x = EE,1 x Multplyng on the left by E,1, and rememberng that as a lnear operaton n commutes wth derentaton by tme, we get d dt (E,1 x)=(e,1 x) Let's just say that we dene new coordnates x 0 = E,1 x. Then we get an equaton that looks lke x _ 0 1 1 0 x x _ = 0 1 0 2 0 2 x 0 2 But ths s just two completely sparate equatons, each one n the smple sngle-varable form we saw at the begnnng of ths secton! We know how to solve that, and how to know whether ther xed pont s stable; and snce these equatons are the same as our orgnal ones (smply represented n derent coordnates), f these are stable so are the orgnal ones, and vce-versa. In the followng system, x1 _ x2 _ =,2 0 0 3 x1 x2 s the xed-pont (0,0) stable or unstable? Why? You need to consder both equatons at once. 7

In equaton 3, f = 3 and! = 1, s (0,0) stable or unstable? How about f = 2 and! =2? Note that the xed pont doesn't always has to be at (0,0). We just put t there n these examples for smplcty. The egenvalue analyss stll holds, however. 4 THE TRACE IS THE SUM OF THE EIGENVALUES Take ann by n matrx A. Then Tr A s called the Trace of A. P Let 1;:::; n be the egenvalues of A, wth correspondng egenvectors e1;:::;e n. Then Tr A =. We wll show ths below ntwo ways. Frst note what ths means for dynamcal systems: f the matrx that descrbes the lnearzaton about a gven xed pont of the dynamcs has Trace equal to zero, then ether (1) all ts egenvalues have zero real part; or (2) some have a negatve real part and some have a postve real part. In the second (more usual) case, therefore, the xed pont s a saddle and s unstable. X A Method 1 (easy but not beautful) Tr (AB) = X 0 @ X j A j B j 1 A smply from the denton of matrx multplcaton. The order n whch we do the sums doesn't matter, however, so we can quckly see that Tr (AB) = X j that s, Trace s commutatve. X A j B j = X j X B j A j =Tr(BA) Now recall (f necessary from the basc math class) that A can be wrtten as A = EE,1 where the columns of E are the egenvectors of A and s a dagonal matrx wth the egenvalues of A as ts dagonal elements. (We have assumed E s nvertble.) Then Tr A =Tr(EE,1 )=Tr((E)E,1 )=Tr(E,1 (E)) = Tr ((E,1 E)) =Tr Ths last s just the sum of the egenvalues. Method 2 (much more nterestng concepts here) Recall that for any square matrx the egenvalues are found by obtanng the solutons to the characterstc polynomal: det(a, I) =0 8

Suppose, for example, that A s a 2-by-2 matrx. Then we end up wth a quadratc, somethng lke 2 + + =0 Now we make an nterestng statement: the matrx A satses the same characterstc polynomal as ts egenvalues. In the example above, ths means that A 2 + A + =0 (4) To see ths, suppose the egenvectors of A span the entre space < n. (That s, there are n of them and they are lnearly ndependent{ equvalent to the condton that E be nvertble n method 1 above.) Then any vector v can be represented as a sum of the egenvectors tmes some lnear coecents: v = P c e. Multply the left hand sde of equaton (4) above on the rght by v to get! (A 2 X X + A + ) c e = c (A 2 + A + ) e = X c e ( 2 + + ) =0 (5) snce 2 + + = 0 for every. (Remember, the e are egenvectors.) Hence the left hand sde of equaton 4, multpled by any vector, always gves zero. Ths necessarly means that (4) s true. Ok, now we know that (4) s true. What does ths tell us about A? Well, t descrbes A n terms of what t does; that s, n terms of how t operates. A s a lnear operator (t operates on vectors, transformng them to new vectors) and when you apply t twce, add tmes applyng t once, and add tmes the vector you started from, you get zero. Ths tells us about A wthout makng any reference to the matrx representaton of A, or to the coordnates A mght be descrbed n. So f we were to rotate, or change coordnates, so that the matrx representaton of A were to change, ths wouldn't change (4). The characterstc polynomal s nvarant to coordnate transformatons of the form TAT,1. That s, the coecents n the polynomal wll not change. Snce TAT,1 s precsely the form of the transformaton used to dagonalze A, we know that the characterstc polynomal s nvarant under dagonalzaton. Thnkng of matrces as representaton-ndependent operators s a very powerful concept. Let's use t now, and move n for the kll wth respect to Trace beng sum of egenvalues. Dene Trace as (mnus the second coecent) of the characterstc polynomal. 12 In the example above, say, ths would be,. Trace s then clearly the sum of dagonal terms (because of how det(a, I) s calculated), whatever representaton A s n. And f we choose the representaton that dagonalzes A, ths s the sum of the egenvalues. QED. Another fun one s that the product of the dagonal entres n a square matrx s always just the product of the egenvalues. 12 For n by n matrces I mean by second coecent the constant that multples n,1 ; and we choose as sgn conventon that the n term n the polynomal be postve. 9