Jacobi s formula for the derivative of a determinant Peter Haggstrom www.gotohaggstrom.com mathsatbondibeach@gmail.com January 4, 208 Introduction Suppose the elements of an n n matrix A depend on a parameter t, say, (in general it coud be several parameters). How do you find an expression for the matrix of the derivative of A? This is where Jacobi s formula arises. In what follows the elements of A(t) will have their t dependence suppressed and simply be referred to by a ij where i refers to rows and j refers to columns. The determinant of A(t) will be given by (again with the t dependence suppressed). With this notational background Jacobi s formula is as follows: where: d ( ) tr(a A ) () tr(x) trace(x) n i x ii A is the inverse of A so clearly it is assumed that the matrix is invertible so that A 0. There are various ways of proving (). Terry Tao provides a proof in his blog which is based on the linearization of the matrix ( [] ). Thus near the identity,
the determinant behaves like the trace, or more precisely one has for a bounded square matrix A and infinitesimal ɛ: det( + ɛ A) + ɛtr(a) + O(ɛ 2 ) (2) However, such proofs, while instructive at one level, abstract away all the gory detail of the basic definitions one is taught in basic linear algebra courses. In order to prove () using those definitions we have to recount the basics. If you are across the details then go to the proof section. There are several ways of defining the determinant of a square matrix and none is particularly simple. For instance one could start with the definition of determinant based on permutation concepts: σ (sgn σ)a j a 2j2... a njn (3) where sgn σ gives the parity or sign of the permutation σ. Good luck using that definition! The classical expansion of a determinant is due to Laplace and is: a i A i + a i2 A i2 + + a in A in a ij A ij (4) j Here A ij is a co-factor and (4) represents an expansion by co-factors along row i. The expansion down column j is given by: a j A j + a 2j A 2j + + +a nj A nj a ij A ij (5) i The co-factors A ij are obtained as follows. If we let M ij be the (n ) (n ) sub-matrix of A obtained by deleting its i th row and j th column, the determinant of M ij is called the minor of the element a ij of A. Thus: A ij ( ) i+j M ij (6) You can remember the pattern of signs by thinking of a chessboard pattern with + s on the main diagonal. 2
+ +... + +... + +.................. Example 2 3 A 2 0 Therefore, expanding along the first row:: (+) 2 2 2 0 + 3 0 3 2 3 8 (7) So we can write the determinant as (expanding by the i th row): ( ) i+j a ij M ij (8) j From basic matrix theory we note that the inverse is calculated as follows ( eg see [2], page 30 ): A (adj A) (9) where adj A is the adjoint of A and it is calculated as follows. The adjoint of an n n matrix is an n n matrix whose (i, j) element is: (adj A) ij ( ) i+j A ji α ji (0) where A ij is the matrix obtained from A by deleting the i th row and j th column. Using the minor notation from (6) we see that: (adj A) ij ( ) i+j M ji () It is important to note the transposition in (0)-() and for this reason the adjoint is often expressed as: 3
where α ij ( ) i+j A ij ( ) i+j M ij. Example ( ) ( ) a a Suppose A 2 2t t a 2 a 22 t 3t then: ( ) t ( ) a22 a adj A 2 a22 a 2 a 2 a a 2 a ( ) Hence A a22 a 2. a 2 a adj A (α ij ) t (2) To get some feel for how one might calculate the derivative of a matrix with repsect to a parameter, take the simple 2 2 case. The determinant is a function of the matrix so let us consider f(a) f(a, a 2, a 2, a 22 ) (remember the t dependency is suppressed for convenience). Hence the chain rule gives: df i j f t i j f da ij (3) where we have written We know that: Hence: da ij since the a ij depend only on t. 2t 3t ( t) t 7t 2 (4) Thus: A 7t 2 d A ( ( d ) tr AA tr ( ) 3t t t 2t ( 2 ) 3 7t 2 ( ) ) 7t 0 2 0 7t t (5) (6) (7) 4
To calculate the derivative of the determinant we use (3) with: Thus: f(a) f(a, a 2, a 2, a 22 ) a a 22 a 2 a 2 (8) f a a 22 f a 2 a 2 f a 2 a 2 f a 22 a (9) Also: da da 2 da 2 da 22 2 3 (20) So formula (3) gives: df 2 i 2 j f da ij From (4) d A 4t. Thus confirming (): 3t 2 + t + ( t) ( ) + 2t 3 4t (2) 4t 7t 2 2 t (22) which matches (7). So the method works. All we need to do now is generalise. 5
2 The proof Using (3) with f(a) we have: df(a) i j da ij (23) Now: ( ) a ik A ik k k k k ( ) aik A ik ( a ik A ik ( a ik + A ik a ik ) [ ( ) i+k M ik }{{} using (6) ] + ( ) i+k a ) ik M ik }{{} using (6) (24) The fundamental insight to have at this point is that: [ ( ) i+k M ik ] 0 (25) That this is so is a direct result of the definition of M ik which is the matrix obtained by deleting the i th row and k th column from A. Thus M ik cannot contain a ij and so (25) results. If you can t see that immediately consider this matrix: a a 2 a 3 A a 2 a 22 a 23 a 3 a 32 a 33 Then: M a 22 a 23 a 32 a 33 a 22 a 33 a 32 a 23 So: M a k 0 for k, 2, 3 6
The next thing to note is that: Thus (24) reduces to: a ik δ kj { if k j 0 if k j (26) ( ) i+k M ik δ kj (27) k From (23): df(a) i i i j j k j da ij ( ) i+k M ik δ kj da ij ( ) i+j M ij da ij (28) If C d (A) A then: c ij k k da ik da ik (adj A) kj ( )k+j M jk }{{} recall () (29) Hence: tr(c) i i i c ii k j da ik ( )k+i M ik da ij ( )i+j M ij (30) 7
where in the last line we have simply changed the second index of summation from k to j appropriately. Thus (28) and (30) establish that tr(a A ). A useful paper dealing with matrix derivatives is by Steven W Nydick of the University of Minnesota [3]. 3 References [] Terrence Tao, https://terrytao.wordpress.com/203/0/3/matrix-identities-as-derivativesof-determinant-identities/ [2] Michael Artin, Algebra, Prentice Hall, 992 [3] Steven W Nydick, With(out) a trace - matrix derivatives the easy way, http://www.tc.umn.edu/~nydic00/docs/unpubs/schonemann_trace_derivatives_ Presentation.pdf 4 History Created 29 April 207 04/0/208 Typos in equations (23) and (28) : df(a) instead of df. Spotted by Henrik Tirsted from Copenhagen. Thank you Henrik! 8