Tensors for matrix differentiation

Similar documents
Quantum Runge-Lenz Vector and the Hydrogen Atom, the hidden SO(4) symmetry

Note 2. Ling fong Li. 1 Klein Gordon Equation Probablity interpretation Solutions to Klein-Gordon Equation... 2

Chapter 6. Rotations and Tensors

Inner Product. Euclidean Space. Orthonormal Basis. Orthogonal

From Biot-Savart Law to Divergence of B (1)

Linear Approximation with Regularization and Moving Least Squares

APPENDIX A Some Linear Algebra

1 Matrix representations of canonical matrices

Fall 2012 Analysis of Experimental Measurements B. Eisenstein/rev. S. Errede

Week 5: Neural Networks

7. Products and matrix elements

Associative Memories

= = = (a) Use the MATLAB command rref to solve the system. (b) Let A be the coefficient matrix and B be the right-hand side of the system.

Supplementary Material: Learning Structured Weight Uncertainty in Bayesian Neural Networks

Difference Equations

Cyclic Codes BCH Codes

3. Stress-strain relationships of a composite layer

Lectures - Week 4 Matrix norms, Conditioning, Vector Spaces, Linear Independence, Spanning sets and Basis, Null space and Range of a Matrix

Formulas for the Determinant

PHYS 705: Classical Mechanics. Calculus of Variations II

MARKOV CHAIN AND HIDDEN MARKOV MODEL

9 Characteristic classes

Differentiating Gaussian Processes

Supervised Learning. Neural Networks and Back-Propagation Learning. Credit Assignment Problem. Feedforward Network. Adaptive System.

Lecture 12: Discrete Laplacian

( ) r! t. Equation (1.1) is the result of the following two definitions. First, the bracket is by definition a scalar product.

Chapter 5. Solution of System of Linear Equations. Module No. 6. Solution of Inconsistent and Ill Conditioned Systems

The Geometry of Logit and Probit

COMPLEX NUMBERS AND QUADRATIC EQUATIONS

Bezier curves. Michael S. Floater. August 25, These notes provide an introduction to Bezier curves. i=0

Week 2. This week, we covered operations on sets and cardinality.

Einstein Summation Convention

n α j x j = 0 j=1 has a nontrivial solution. Here A is the n k matrix whose jth column is the vector for all t j=0

Primer on High-Order Moment Estimators

LECTURE 21 Mohr s Method for Calculation of General Displacements. 1 The Reciprocal Theorem

Generalized Linear Methods

Crystal Interpretation of Kerov Kirillov Reshetikhin Bijection II

= z 20 z n. (k 20) + 4 z k = 4

A Quantum Gauss-Bonnet Theorem

However, since P is a symmetric idempotent matrix, of P are either 0 or 1 [Eigen-values

10-701/ Machine Learning, Fall 2005 Homework 3

For now, let us focus on a specific model of neurons. These are simplified from reality but can achieve remarkable results.

Mathematical Preparations

Kinematics of Fluids. Lecture 16. (Refer the text book CONTINUUM MECHANICS by GEORGE E. MASE, Schaum s Outlines) 17/02/2017

Société de Calcul Mathématique SA

Physics 5153 Classical Mechanics. D Alembert s Principle and The Lagrangian-1

Some results on a cross-section in the tensor bundle

Image Classification Using EM And JE algorithms

22.51 Quantum Theory of Radiation Interactions

Bézier curves. Michael S. Floater. September 10, These notes provide an introduction to Bézier curves. i=0

MEM 255 Introduction to Control Systems Review: Basics of Linear Algebra

Maximum Likelihood Estimation of Binary Dependent Variables Models: Probit and Logit. 1. General Formulation of Binary Dependent Variables Models

are called the contravariant components of the vector a and the a i are called the covariant components of the vector a.

Limited Dependent Variables

Gaussian Conditional Random Field Network for Semantic Segmentation - Supplementary Material

Problem Do any of the following determine homomorphisms from GL n (C) to GL n (C)?

Ballot Paths Avoiding Depth Zero Patterns

Structure and Drive Paul A. Jensen Copyright July 20, 2003

Laboratory 3: Method of Least Squares

A finite difference method for heat equation in the unbounded domain

THE SUMMATION NOTATION Ʃ

1 GSW Iterative Techniques for y = Ax

Lagrangian Field Theory

QUARTERLY OF APPLIED MATHEMATICS

Andre Schneider P622

2.3 Nilpotent endomorphisms

Linear Feature Engineering 11

σ τ τ τ σ τ τ τ σ Review Chapter Four States of Stress Part Three Review Review

Grover s Algorithm + Quantum Zeno Effect + Vaidman

Laboratory 1c: Method of Least Squares

MMA and GCMMA two methods for nonlinear optimization

Radar Trackers. Study Guide. All chapters, problems, examples and page numbers refer to Applied Optimal Estimation, A. Gelb, Ed.

Homework Assignment 3 Due in class, Thursday October 15

Deriving the Dual. Prof. Bennett Math of Data Science 1/13/06

CHAPTER 4. Vector Spaces

Norms, Condition Numbers, Eigenvalues and Eigenvectors

CHALMERS, GÖTEBORGS UNIVERSITET. SOLUTIONS to RE-EXAM for ARTIFICIAL NEURAL NETWORKS. COURSE CODES: FFR 135, FIM 720 GU, PhD

Effects of Ignoring Correlations When Computing Sample Chi-Square. John W. Fowler February 26, 2012

Neural network-based athletics performance prediction optimization model applied research

A how to guide to second quantization method.

Srednicki Chapter 34

763622S ADVANCED QUANTUM MECHANICS Solution Set 1 Spring c n a n. c n 2 = 1.

Three views of mechanics

Monica Purcaru and Nicoleta Aldea. Abstract

CIVL 8/7117 Chapter 10 - Isoparametric Formulation 42/56

COXREG. Estimation (1)

Errata to Invariant Theory with Applications January 28, 2017

corresponding to those of Heegaard diagrams by the band moves

a b a In case b 0, a being divisible by b is the same as to say that

THE CHINESE REMAINDER THEOREM. We should thank the Chinese for their wonderful remainder theorem. Glenn Stevens

Feb 14: Spatial analysis of data fields

Moments of Inertia. and reminds us of the analogous equation for linear momentum p= mv, which is of the form. The kinetic energy of the body is.

Lecture 20: Lift and Project, SDP Duality. Today we will study the Lift and Project method. Then we will prove the SDP duality theorem.

On the Power Function of the Likelihood Ratio Test for MANOVA

Errors for Linear Systems

On the formulation of the laws of Nature with five homogeneous coordinates

U.C. Berkeley CS294: Beyond Worst-Case Analysis Luca Trevisan September 5, 2017

CS229 Lecture notes. Andrew Ng

CSci 6974 and ECSE 6966 Math. Tech. for Vision, Graphics and Robotics Lecture 21, April 17, 2006 Estimating A Plane Homography

L-Edge Chromatic Number Of A Graph

Transcription:

1 Tensors for matrx dfferentaton Rchard Turner Here are some notes on how to use tensors to fnd matrx dervatves, and the reaton to the. (Hadamard), vec, (Kronecer), vec-transpose and reshape operators. I wrote these notes for mysef, and I apoogse for any mstaes and confusons. Two sectons are currenty unfnshed: I hope to compete them soon. 1 A tensor notaton Let s setup one usefu form of tensor notaton, whch ncorporates the matrx and nner product, the outer product, the Hadamard (MATLAB.* or ) product dag and dag 1. These w be denoted usng dfferent combnatons of pars of up-stars and down-stars ndces. If we have ony 2 nd order tensors (and ower) we want to be abe to easy convert the resut nto matrx representaton. We have a free choce for the horzonta orderng of ndces therefore ths can be used to denote transposes and the order of mutpcaton. a j b j = j a j b j (1) = (AB) (2) A j = (A T ) j (3) a b j = (ab T ) j (4) a = a (5) = tr A (6) A j B j = H jm n A B m n (7) = (A B) j (8) H jm n = δ δ j δ m δn j (9) A, = dag(a) (10) A j δ j = (dag 1 dag A) j (11) The Kronecer deta δ j s 1 ff = j and zero for j. From the matrx perspectve, the frst ndces ndex the rows and the second the coumns. A second order tensor must have one upstars

2 and one downstars ndex. The ony way of movng downstars ndces upstars, s to fp ALL ndces, and ths does not affect anythng. Summatons occur between one down-stars ndex and one up-stars ndex, (the Ensten conventon). Repeated down-stars or up-stars ndces mpy a Hadamard or entry-wse product. As an exampe, the Hadamard product between two vectors (of the same sze) s: a b = [a 1, a 2,..., a I ] T. [b 1, b 2,..., b I ] T = [a 1 b 1, a 2 b 2,..., a I b I ] T If we have a bunch of second and/or frst order tensors (eg. S j = B j W A ), we can convert them nto matrx/vector notaton usng the order of the ndces from eft to rght, and the rue for the transpose. 1. Mae the frst and the ast ndces match the LHS (S j (W T ) A B j ) = W A B j = 2. Transpose the centra objects so the ndces run consecutvey (S j = (W T ) (A T ) B j 3. Repace wth matrx notaton (S = W T A T B) 2 Basc dervatves ) To convert dervatves found usng the suffx notaton nto matrx dervatves, we need to be aware of one more conventon. Imagne dfferentatng a vector x by another vector y. The resut s a matrx, but we must choose whch way round we want the rows and coumns. Conventonay the choce s: x y j = j (12) So the chan rue s easy to appy (and ntutve) by a rght hand mutpcaton: x z = x y j y j z (13) = j j (14) We mght aso want to dfferentate a matrx by a matrx. Athough the resutng forth order tensor cannot be represented a matrx, the object coud be requred n appyng the chan rue (see the exampe n the foowng secton where we dfferentate an object e tr(w W T ) wth

3 respect to W ) and so we need rues for assgnng the ndces. Lucy we have enough conventons to unambguousy specfy ths: A j B = j,, (15) Where the orderng wthng the upstars and downstars sots s arbtrary. Note that ths reaton defnes a the dervatves you can form wth 2 nd order tensors and ower (a subset of whch are the three types of dervatves that can be represented as matrces). 2.0.1 Some exampes Here are some expct exampes: Fnd: Souton: f dw j tr(aw CW T B) dw = f dw (16) = A W m C n m(w T ) p n B p (17) = AW m CmW n p nbp (18) = Aδ δj m CmW n p nbp (19) +AW m Cmδ n p δ jn Bp (20) = AC j n W p nbp (21) +AW m CmB j (22) = Cj n (W T ) n p BpA (23) +(C T ) j m(w T ) m (A T ) (B T ) (24) = (CW T BA + C T W T A T B T ) j (25) Note the dscrepancy wth some texts (for exampe the wdey used the matrx coo boo): ther resuts dffer by a transpose as ther defntons for x/ M and M/ x are not consstent wth regard to ther use n the chan rue. Fnd: da 1 (26)

4 Souton: δ di j d(a 1 ) j d(a 1 ) j = da (A 1 ) j 0 = da (A 1 ) j + A = (A 1 ) = da [ 1 da A A 1 3 Dfferentatng determnants compete ths secton d(a 1 ) j (27) (28) (A 1 ) j (29) ] j (30) 4 Dfferentatng structured matrces Imagne we want to dfferentate a matrx wth some structure, for exampe a symmetrc matrx. To form the dervatves we can use the matrx-sef-dervatve: So that: Γ j,, = A j A (31) f A = f A j A j A (32) When formng dfferentas we have to be carefu to ony sum over unque entres n A: df = f A = unque, da (33) f A da (34) One ubqutous form of structured matrces are the symmetrc matrces. For ths cass the matrx-sef-dervatve s:

5 S j S = δ δ j + δ, δ j, δ j δ δ (35) The frst two terms mae sure we count the off dagona eements twce, and the second term avods over countng of the dagona and nvoves some entry-wse products. 4.1 A sermon about symmetrc matrces Let s do a famy of dervatves correcty that most peope muc up. where S s symmetrc. Fnd: df(s) ds f(s) S ds j = df(s) ds j ds [ = f (S) j δ δ j + δ, δ j, δ j δ ] δ [ = f (S) j δ δ j + δ, δ j, δ j δ ] δ = f (S) + f (S) = f (S) + f (S) = f (S) + f (S) (36) (37) (38) f (S) jδ j δ δ (39) f (S) δ δ (40) f (S) δ (41) = [f (S) + f (S) T f (S) I] (42) = [2f (S) f (S) I] (43) If we want the dfferenta, we must sum over a the unque eements (and no more): df(s) = [2f (S) f (S) I] jds j (44),j Here s a concrete exampe where peope forget the above: magne fndng the ML covarance matrx of a mutvarate Gaussan dstrbuton, gven some data. The ehood s gven by: P ({x n } N n=1) = n [ det(2πσ) 1/2 exp 1 ] 2 (x n µ) T Σ 1 (x n µ) (45) og P ({x n } N n=1) = N 2 [ D og(2π) og det(σ 1 ) + tr ( Σ 1 X )] (46) Where we have ntroduced X = (x n µ)(x n µ) T. Now we dfferentate ths wrt. Σ 1 :

6 og P ({x n } N n=1) dσ 1 = N 2 [ og det(σ 1 ) + tr ] (Σ 1 X) dσ 1 dσ 1 (47) = N 2 [2Σ Σ I + 2X X I] (48) These are the correct dervatves. As 2Σ Σ I + 2X X I Σ X peope can recover Σ = X wthout usng the symmetrsed expressons. However, f we wanted to do gradent ascent here, then we shoud use the foowng updates: (Σ 1 n+1) j = (Σ 1 n+1) j η [2Σ Σ I + 2X X I] j (49) Where Σ s an upper tranguar matrx. Most peope woud use: (Σ 1 n+1) j = (Σ 1 n+1) j η[σ X] j (50) If we ntase wth symmetrc Σ 1 0 then the ncorrect proceedure w wa us aong the manfod of symmetrc matrces towards the ML souton. You mght thn numerca errors coud, n prncpe, step us off the manfod of symmetrc matrces: Aftera, (Σ 1 ) j X j s nvarant so ong as (Σ 1 ) j +(Σ 1 ) j = const. There appears to be no pressure to eep Σ 1 symmetrc. Thngs actuay turn out to be worse than ths. 45 s not actuay a correct expresson for a Gaussan dstrbuton f Σ 1 s not symmetrc. Specfcay the normasng constant shoud be repaced by det( 1 2 [Σ+ΣT ]). If you don t use ths symmetrsed form, the manfod of symmetrc matrces es aong a mnmum and the ML souton s a sadde pont (see Fg. 1). For ths reason, t seems prudent to use the correct gradents when dong gradent ascent, and to remember to normase correcty. 5 Reaton to the vec and ronecer product operators The vec, Kronecer product, vector transpose, and reshape operators shuffe tensors so they can be represented n arrays of dfferent dmensonates (and szes). The are most ntutvey defned by vsua exampes, but usefu resuts can be proved usng a tensor representaton that aways nvoves a tensor product between the object we are transformng and an ndcator tensor. In ths secton our tensor agebra does not need to dea wth entrywse products etc.. Therefore, to ad the carty of the presentaton we

Fgure 1: a. The expected cost functon s symmetrc, and the maxmum s a rdge. b. The cost functon wth the ncorrect normaser s not symmetrc and the MLparameter vaues correspond to a sadde pont. 7

8 use the more usua suffx notaton. The resuts presented here are easy to generase usng the prevous framewor (as s needed to reate the entry-wse and Kronecer products, and dag, for exampe), but the resut s ess aesthetc. 5.1 Vec The vec operator ets you represent a matrx as a vector, by stacng the coumns. For exampe: [( )] x11 x vec 12 x 13 x 21 x 22 x 23 = x 11 x 21 x 12 x 22 x 12 x 23 (51) The tensor representaton of ths operator s: 5.2 Kronecer Product x = V ab X ab (52) V ab = δ,a+(b 1)A (53) The Kronecer product operator ( ) ets you represent the outer product of two matrces (a 4th order tensor) as a matrx. ( ) x11 x 12 Y = x 21 x 22 ( x11 Y ) x 12 Y x 21 Y x 22 Y (54) Aternatvey, wrtten as a tensor, we have: Z j = K jbd ac Xa b Yd c (55) K jabcd = δ,c+(a 1)C δ j,d+(b 1)D (56) Exampes The mportant resut reatng the vec and Kronecer products s: (C T A) vec(b) = vec(abc) (57) whch can be proved usng the defntons above:

9 [(C T A) vec(b)] j = K jabcd C ba A cd V jef B ef (58) = δ,c+(a 1)C δ j,d+(b 1)D C ba A cd δ j,e+(f 1)E B ef (59) = δ,c+(a 1)C δ d,e δ bf C ba A cd B ef (60) = δ,c+(a 1)C C ba A cd B bd (61) = V ca A cd B bd C ba (62) = vec(acb) (63) 5.3 vec-transpose Ths s not the same as reshape n MATLAB despte what Tom Mna cams. compete ths secton 5.4 reshape Reshape generases the vec operator. It aows us to hande hgh dmensona objects easy. For exampe, by remappng tensors nto matrces we can form tensor nverses. We want to be abe to map an array of sze A B C... wth n tota A B C... = N eements nto another array of I J K... = N eements. To do ths we need to specfy where each of the eements n the od array appears n the new array. There are ots of choces for how we ay down the eements n the new array. It woud be usefu to do ths systematcay. One way to do ths s to come up wth a systematc method for numberng each eement n an array, and then we coud ay down eements n the new array such that they w be assgned the same number. A systematc numberng system can be constructed as foows. In a norma countng system, we choose a base B (eg. 10, 2). We can unquey form nteger numbers as a sum of upto B 1 1s, Bs, B 2 s and so on. eg. B = 10, 954 = 9 100 + 5 50 + 4 1 eg. B = 2, 13 = 1 2 3 + 1 2 2 + 0 2 1 + 1 1. The coeffcents are the representaton of the number. Let s defne a new countng system whch has a non-constant base. The ast number w agan represent the number of ones and w tae a vaues from 0 to I 1, the second number j represents the number of Is and taes vaues from 0 to J 1, the thrd number represents the number of I Js and taes vaues from 0 to K 1, and so on. eg. Usng {I, J, K} = {2, 3, 4} the number 21 s 3 6+1 2+1 1 and we can represent I J K = 24 numbers ths way. Now f we assocate + 1, j + 1, + 1,... wth the poston n

10 an N dmensona array, we have assgned a ponts n that array wth a unque nteger number that forms a sequence. T,j,,... = R I,J,K,...;A,B,C,...,j,,...,a,b,c,... S a,b,c,... (64) = reshape(s, [I, J, K,...]) (65) R I,J,K,...;A,B,C,...,j,,...,a,b,c,... = δ a 1+(b 1)A+(c 1)AB+..., 1+(j 1)I+( 1)IJ+... (66) 5.4.1 Exampes: Here are some exampes and usefu resuts: From ths t s smpe to show that reshapng a tensor, and then reshapng t bac to ts orgna sze returns a the entres to ther orgna postons: T a,b,c,... = R A,B,C,...;I,J,K,... a,b,c,...,,j,,... R I,J,K,...;A,B,C,...,j,,...,a,b,c,... S a,b,c,... (67) = δ a 1+(b 1)A+(c 1)AB+..., 1+(j 1)I+( 1)IJ+... δ a 1+(b 1)A+(c 1)AB+..., 1+(j 1)I+( 1)IJ+... S a,b,c,... (68) = δ a 1+(b 1)A+(c 1)AB+...,a 1+(b 1)A+(c 1)AB+...S a,b,c,... (69) = δ a,aδ b,bδ c,cs a,b,c,... (70) = S a,b,c,... (71) Ths resut shoud be obvous. The way we ntroduced the reshape operator was va a numberng system that was unque for a arrays of a gven shape. Ths means f we reshape an array and reshape t agan, the resut must be equvaent to reshapng drecty: ntermedate reshapngs cannot effect anythng. A probem where the reshape operator s usefu arses n mut-near modes. There we have to sove: α d,,j,... = g d,a,b,... β a,b,...,,j,... (72) where we want g d,a,b,... and we now α d,,j,... and β a,b,...,,j,.... The dmensonates are I = A, J = B.... The souton amounts to fndng the nverse of β a,b,...,,j,.... Reshapng the eft and rght hand sdes nto [D, Q = I J...] matrces we have: R D,Q;D,I,J,... e,q,d,,j,... α d,,j,... = R D,Q;I,J,... e,q,d,,j,... g d,a,b,...β a,b,...,,j,... (73) = δ e+(q 1)D,d+( 1)D+(j 1)DI+... g d,a,b,... β a,b,...,,j,... (74) = δ e,d δ q,+(j 1)I+... g d,a,b,... β a,b,...,,j,... (75) = g e,a,b,... δ q,+(j 1)I+... β a,b,...,,j,... (76)

11 Lettng X = reshape(α, [D, Q]), we repace the tensor product on the RHS wth a matrx product usng the reshape operator agan: X e,q = δ a,aδ b,b... g e,a,b,... δ q,+(j 1)I+... β a,b,...,,j,... (77) = R A,B,...;Q a,b,...,g RQ;A,B,... g,a,b,... g e,a,b,... δ q,+(j 1)I+... β a,b,...,,j,... (78) = R Q;A,B,... g,a,b,... g e,a,b,... δ q,+(j 1)I+... R A,B,...;Q a,b,...,f,g β a,b,...,,j,... (79) = R D,Q;D,A,B,... e,g,e,a,b,... g e,a,b,... R Q,Q;A,B,...,A,B,... g,q,a,b,...,,j... β a,b,...,,j,... (80) = reshape(g, [D, Q]) e,g reshape(β, [Q, Q]) g,q (81) Lettng Y = reshape(β, [Q, Q]) the souton s: g d,a,b,... = reshape(xy 1, [D, A, B,...]) d,a,b,... (82)