Tensors for matrix differentiation

1 Tensors for matrx dfferentaton Rchard Turner Here are some notes on how to use tensors to fnd matrx dervatves, and the reaton to the. (Hadamard), vec, (Kronecer), vec-transpose and reshape operators. I wrote these notes for mysef, and I apoogse for any mstaes and confusons. Two sectons are currenty unfnshed: I hope to compete them soon. 1 A tensor notaton Let s setup one usefu form of tensor notaton, whch ncorporates the matrx and nner product, the outer product, the Hadamard (MATLAB.* or ) product dag and dag 1. These w be denoted usng dfferent combnatons of pars of up-stars and down-stars ndces. If we have ony 2 nd order tensors (and ower) we want to be abe to easy convert the resut nto matrx representaton. We have a free choce for the horzonta orderng of ndces therefore ths can be used to denote transposes and the order of mutpcaton. a j b j = j a j b j (1) = (AB) (2) A j = (A T ) j (3) a b j = (ab T ) j (4) a = a (5) = tr A (6) A j B j = H jm n A B m n (7) = (A B) j (8) H jm n = δ δ j δ m δn j (9) A, = dag(a) (10) A j δ j = (dag 1 dag A) j (11) The Kronecer deta δ j s 1 ff = j and zero for j. From the matrx perspectve, the frst ndces ndex the rows and the second the coumns. A second order tensor must have one upstars

2 and one downstars ndex. The ony way of movng downstars ndces upstars, s to fp ALL ndces, and ths does not affect anythng. Summatons occur between one down-stars ndex and one up-stars ndex, (the Ensten conventon). Repeated down-stars or up-stars ndces mpy a Hadamard or entry-wse product. As an exampe, the Hadamard product between two vectors (of the same sze) s: a b = [a 1, a 2,..., a I ] T. [b 1, b 2,..., b I ] T = [a 1 b 1, a 2 b 2,..., a I b I ] T If we have a bunch of second and/or frst order tensors (eg. S j = B j W A ), we can convert them nto matrx/vector notaton usng the order of the ndces from eft to rght, and the rue for the transpose. 1. Mae the frst and the ast ndces match the LHS (S j (W T ) A B j ) = W A B j = 2. Transpose the centra objects so the ndces run consecutvey (S j = (W T ) (A T ) B j 3. Repace wth matrx notaton (S = W T A T B) 2 Basc dervatves ) To convert dervatves found usng the suffx notaton nto matrx dervatves, we need to be aware of one more conventon. Imagne dfferentatng a vector x by another vector y. The resut s a matrx, but we must choose whch way round we want the rows and coumns. Conventonay the choce s: x y j = j (12) So the chan rue s easy to appy (and ntutve) by a rght hand mutpcaton: x z = x y j y j z (13) = j j (14) We mght aso want to dfferentate a matrx by a matrx. Athough the resutng forth order tensor cannot be represented a matrx, the object coud be requred n appyng the chan rue (see the exampe n the foowng secton where we dfferentate an object e tr(w W T ) wth

3 respect to W ) and so we need rues for assgnng the ndces. Lucy we have enough conventons to unambguousy specfy ths: A j B = j,, (15) Where the orderng wthng the upstars and downstars sots s arbtrary. Note that ths reaton defnes a the dervatves you can form wth 2 nd order tensors and ower (a subset of whch are the three types of dervatves that can be represented as matrces). 2.0.1 Some exampes Here are some expct exampes: Fnd: Souton: f dw j tr(aw CW T B) dw = f dw (16) = A W m C n m(w T ) p n B p (17) = AW m CmW n p nbp (18) = Aδ δj m CmW n p nbp (19) +AW m Cmδ n p δ jn Bp (20) = AC j n W p nbp (21) +AW m CmB j (22) = Cj n (W T ) n p BpA (23) +(C T ) j m(w T ) m (A T ) (B T ) (24) = (CW T BA + C T W T A T B T ) j (25) Note the dscrepancy wth some texts (for exampe the wdey used the matrx coo boo): ther resuts dffer by a transpose as ther defntons for x/ M and M/ x are not consstent wth regard to ther use n the chan rue. Fnd: da 1 (26)

4 Souton: δ di j d(a 1 ) j d(a 1 ) j = da (A 1 ) j 0 = da (A 1 ) j + A = (A 1 ) = da [ 1 da A A 1 3 Dfferentatng determnants compete ths secton d(a 1 ) j (27) (28) (A 1 ) j (29) ] j (30) 4 Dfferentatng structured matrces Imagne we want to dfferentate a matrx wth some structure, for exampe a symmetrc matrx. To form the dervatves we can use the matrx-sef-dervatve: So that: Γ j,, = A j A (31) f A = f A j A j A (32) When formng dfferentas we have to be carefu to ony sum over unque entres n A: df = f A = unque, da (33) f A da (34) One ubqutous form of structured matrces are the symmetrc matrces. For ths cass the matrx-sef-dervatve s:

5 S j S = δ δ j + δ, δ j, δ j δ δ (35) The frst two terms mae sure we count the off dagona eements twce, and the second term avods over countng of the dagona and nvoves some entry-wse products. 4.1 A sermon about symmetrc matrces Let s do a famy of dervatves correcty that most peope muc up. where S s symmetrc. Fnd: df(s) ds f(s) S ds j = df(s) ds j ds [ = f (S) j δ δ j + δ, δ j, δ j δ ] δ [ = f (S) j δ δ j + δ, δ j, δ j δ ] δ = f (S) + f (S) = f (S) + f (S) = f (S) + f (S) (36) (37) (38) f (S) jδ j δ δ (39) f (S) δ δ (40) f (S) δ (41) = [f (S) + f (S) T f (S) I] (42) = [2f (S) f (S) I] (43) If we want the dfferenta, we must sum over a the unque eements (and no more): df(s) = [2f (S) f (S) I] jds j (44),j Here s a concrete exampe where peope forget the above: magne fndng the ML covarance matrx of a mutvarate Gaussan dstrbuton, gven some data. The ehood s gven by: P ({x n } N n=1) = n [ det(2πσ) 1/2 exp 1 ] 2 (x n µ) T Σ 1 (x n µ) (45) og P ({x n } N n=1) = N 2 [ D og(2π) og det(σ 1 ) + tr ( Σ 1 X )] (46) Where we have ntroduced X = (x n µ)(x n µ) T. Now we dfferentate ths wrt. Σ 1 :

6 og P ({x n } N n=1) dσ 1 = N 2 [ og det(σ 1 ) + tr ] (Σ 1 X) dσ 1 dσ 1 (47) = N 2 [2Σ Σ I + 2X X I] (48) These are the correct dervatves. As 2Σ Σ I + 2X X I Σ X peope can recover Σ = X wthout usng the symmetrsed expressons. However, f we wanted to do gradent ascent here, then we shoud use the foowng updates: (Σ 1 n+1) j = (Σ 1 n+1) j η [2Σ Σ I + 2X X I] j (49) Where Σ s an upper tranguar matrx. Most peope woud use: (Σ 1 n+1) j = (Σ 1 n+1) j η[σ X] j (50) If we ntase wth symmetrc Σ 1 0 then the ncorrect proceedure w wa us aong the manfod of symmetrc matrces towards the ML souton. You mght thn numerca errors coud, n prncpe, step us off the manfod of symmetrc matrces: Aftera, (Σ 1 ) j X j s nvarant so ong as (Σ 1 ) j +(Σ 1 ) j = const. There appears to be no pressure to eep Σ 1 symmetrc. Thngs actuay turn out to be worse than ths. 45 s not actuay a correct expresson for a Gaussan dstrbuton f Σ 1 s not symmetrc. Specfcay the normasng constant shoud be repaced by det( 1 2 [Σ+ΣT ]). If you don t use ths symmetrsed form, the manfod of symmetrc matrces es aong a mnmum and the ML souton s a sadde pont (see Fg. 1). For ths reason, t seems prudent to use the correct gradents when dong gradent ascent, and to remember to normase correcty. 5 Reaton to the vec and ronecer product operators The vec, Kronecer product, vector transpose, and reshape operators shuffe tensors so they can be represented n arrays of dfferent dmensonates (and szes). The are most ntutvey defned by vsua exampes, but usefu resuts can be proved usng a tensor representaton that aways nvoves a tensor product between the object we are transformng and an ndcator tensor. In ths secton our tensor agebra does not need to dea wth entrywse products etc.. Therefore, to ad the carty of the presentaton we

Fgure 1: a. The expected cost functon s symmetrc, and the maxmum s a rdge. b. The cost functon wth the ncorrect normaser s not symmetrc and the MLparameter vaues correspond to a sadde pont. 7

8 use the more usua suffx notaton. The resuts presented here are easy to generase usng the prevous framewor (as s needed to reate the entry-wse and Kronecer products, and dag, for exampe), but the resut s ess aesthetc. 5.1 Vec The vec operator ets you represent a matrx as a vector, by stacng the coumns. For exampe: [( )] x11 x vec 12 x 13 x 21 x 22 x 23 = x 11 x 21 x 12 x 22 x 12 x 23 (51) The tensor representaton of ths operator s: 5.2 Kronecer Product x = V ab X ab (52) V ab = δ,a+(b 1)A (53) The Kronecer product operator ( ) ets you represent the outer product of two matrces (a 4th order tensor) as a matrx. ( ) x11 x 12 Y = x 21 x 22 ( x11 Y ) x 12 Y x 21 Y x 22 Y (54) Aternatvey, wrtten as a tensor, we have: Z j = K jbd ac Xa b Yd c (55) K jabcd = δ,c+(a 1)C δ j,d+(b 1)D (56) Exampes The mportant resut reatng the vec and Kronecer products s: (C T A) vec(b) = vec(abc) (57) whch can be proved usng the defntons above:

9 [(C T A) vec(b)] j = K jabcd C ba A cd V jef B ef (58) = δ,c+(a 1)C δ j,d+(b 1)D C ba A cd δ j,e+(f 1)E B ef (59) = δ,c+(a 1)C δ d,e δ bf C ba A cd B ef (60) = δ,c+(a 1)C C ba A cd B bd (61) = V ca A cd B bd C ba (62) = vec(acb) (63) 5.3 vec-transpose Ths s not the same as reshape n MATLAB despte what Tom Mna cams. compete ths secton 5.4 reshape Reshape generases the vec operator. It aows us to hande hgh dmensona objects easy. For exampe, by remappng tensors nto matrces we can form tensor nverses. We want to be abe to map an array of sze A B C... wth n tota A B C... = N eements nto another array of I J K... = N eements. To do ths we need to specfy where each of the eements n the od array appears n the new array. There are ots of choces for how we ay down the eements n the new array. It woud be usefu to do ths systematcay. One way to do ths s to come up wth a systematc method for numberng each eement n an array, and then we coud ay down eements n the new array such that they w be assgned the same number. A systematc numberng system can be constructed as foows. In a norma countng system, we choose a base B (eg. 10, 2). We can unquey form nteger numbers as a sum of upto B 1 1s, Bs, B 2 s and so on. eg. B = 10, 954 = 9 100 + 5 50 + 4 1 eg. B = 2, 13 = 1 2 3 + 1 2 2 + 0 2 1 + 1 1. The coeffcents are the representaton of the number. Let s defne a new countng system whch has a non-constant base. The ast number w agan represent the number of ones and w tae a vaues from 0 to I 1, the second number j represents the number of Is and taes vaues from 0 to J 1, the thrd number represents the number of I Js and taes vaues from 0 to K 1, and so on. eg. Usng {I, J, K} = {2, 3, 4} the number 21 s 3 6+1 2+1 1 and we can represent I J K = 24 numbers ths way. Now f we assocate + 1, j + 1, + 1,... wth the poston n

10 an N dmensona array, we have assgned a ponts n that array wth a unque nteger number that forms a sequence. T,j,,... = R I,J,K,...;A,B,C,...,j,,...,a,b,c,... S a,b,c,... (64) = reshape(s, [I, J, K,...]) (65) R I,J,K,...;A,B,C,...,j,,...,a,b,c,... = δ a 1+(b 1)A+(c 1)AB+..., 1+(j 1)I+( 1)IJ+... (66) 5.4.1 Exampes: Here are some exampes and usefu resuts: From ths t s smpe to show that reshapng a tensor, and then reshapng t bac to ts orgna sze returns a the entres to ther orgna postons: T a,b,c,... = R A,B,C,...;I,J,K,... a,b,c,...,,j,,... R I,J,K,...;A,B,C,...,j,,...,a,b,c,... S a,b,c,... (67) = δ a 1+(b 1)A+(c 1)AB+..., 1+(j 1)I+( 1)IJ+... δ a 1+(b 1)A+(c 1)AB+..., 1+(j 1)I+( 1)IJ+... S a,b,c,... (68) = δ a 1+(b 1)A+(c 1)AB+...,a 1+(b 1)A+(c 1)AB+...S a,b,c,... (69) = δ a,aδ b,bδ c,cs a,b,c,... (70) = S a,b,c,... (71) Ths resut shoud be obvous. The way we ntroduced the reshape operator was va a numberng system that was unque for a arrays of a gven shape. Ths means f we reshape an array and reshape t agan, the resut must be equvaent to reshapng drecty: ntermedate reshapngs cannot effect anythng. A probem where the reshape operator s usefu arses n mut-near modes. There we have to sove: α d,,j,... = g d,a,b,... β a,b,...,,j,... (72) where we want g d,a,b,... and we now α d,,j,... and β a,b,...,,j,.... The dmensonates are I = A, J = B.... The souton amounts to fndng the nverse of β a,b,...,,j,.... Reshapng the eft and rght hand sdes nto [D, Q = I J...] matrces we have: R D,Q;D,I,J,... e,q,d,,j,... α d,,j,... = R D,Q;I,J,... e,q,d,,j,... g d,a,b,...β a,b,...,,j,... (73) = δ e+(q 1)D,d+( 1)D+(j 1)DI+... g d,a,b,... β a,b,...,,j,... (74) = δ e,d δ q,+(j 1)I+... g d,a,b,... β a,b,...,,j,... (75) = g e,a,b,... δ q,+(j 1)I+... β a,b,...,,j,... (76)

11 Lettng X = reshape(α, [D, Q]), we repace the tensor product on the RHS wth a matrx product usng the reshape operator agan: X e,q = δ a,aδ b,b... g e,a,b,... δ q,+(j 1)I+... β a,b,...,,j,... (77) = R A,B,...;Q a,b,...,g RQ;A,B,... g,a,b,... g e,a,b,... δ q,+(j 1)I+... β a,b,...,,j,... (78) = R Q;A,B,... g,a,b,... g e,a,b,... δ q,+(j 1)I+... R A,B,...;Q a,b,...,f,g β a,b,...,,j,... (79) = R D,Q;D,A,B,... e,g,e,a,b,... g e,a,b,... R Q,Q;A,B,...,A,B,... g,q,a,b,...,,j... β a,b,...,,j,... (80) = reshape(g, [D, Q]) e,g reshape(β, [Q, Q]) g,q (81) Lettng Y = reshape(β, [Q, Q]) the souton s: g d,a,b,... = reshape(xy 1, [D, A, B,...]) d,a,b,... (82)