Matrix Differentiation CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Matrix Differentiation 1 / 35
Matrix Derivatives There are 6 common types of matrix derivatives: Type Scalar Vector Matrix Scalar Y Vector Matrix Leow Wee Kheng (NUS) Matrix Differentiation 2 / 35
Derivatives by Scalar Numerator Layout Notation Y = = 11. m1 1. m 1n.... mn Denominator Layout Notation [ ] = 1 m Leow Wee Kheng (NUS) Matrix Differentiation 3 / 35
Derivatives by Vector Numerator Layout Notation [ = 1 = ] n 1 1 1 n..... m m 1 n Denominator Layout Notation = 1. n 1 m = 1 1..... 1 m n n Leow Wee Kheng (NUS) Matrix Differentiation 4 / 35
Derivative by Matrix Numerator Layout Notation = 11 m1..... 1n mn Denominator Layout Notation = 11 1n..... m1 mn Leow Wee Kheng (NUS) Matrix Differentiation 5 / 35
Pictorial Representation...... numerator layout denominator layout.. Leow Wee Kheng (NUS) Matrix Differentiation 6 / 35
Caution Most books and papers don t state which convention they use. Reference [2] uses both conventions but clearly differentiate them. [ ] = 1 n = 1. n 1 1 1 m = 1 n..... m m = 1 1..... 1 m 1 n n n It is best not to mix the two conventions in your equations. We adopt numerator layout notation. Leow Wee Kheng (NUS) Matrix Differentiation 7 / 35
Commonly Used Derivatives Commonly Used Derivatives Here, scalar a, vector a and matrix A are not functions of x and x. da (C1) = 0 (column matrix) dx (C2) (C3) (C4) (C5) da dx = 0 (row matrix) da dx = 0 (matrix) da dx = 0 (matrix) dx dx = I Leow Wee Kheng (NUS) Matrix Differentiation 8 / 35
Commonly Used Derivatives (C6) (C7) da x dx = dx a dx = a dx x dx = 2x (C8) d(x a) 2 dx = 2x aa (C9) (C10) dax dx = A dx A dx = A (C11) dx Ax dx = x (A+A ) Leow Wee Kheng (NUS) Matrix Differentiation 9 / 35
Derivatives of Scalar by Scalar Derivatives of Scalar by Scalar (SS1) (u+v) = u + v (SS2) uv = u v +v u (product rule) (SS3) g(u) = g(u) u u (chain rule) (SS4) f(g(u)) = f(g) g g(u) u u (chain rule) Leow Wee Kheng (NUS) Matrix Differentiation 10 / 35
Derivatives of Vector by Scalar Derivatives of Vector by Scalar (VS1) (VS2) (VS3) au = a u where a is not a function of x. Au = A u where A is not a function of x. u = ( ) u (VS4) (u+v) = u + v Leow Wee Kheng (NUS) Matrix Differentiation 11 / 35
Derivatives of Vector by Scalar (VS5) g(u) = g(u) u u (chain rule) with consistent matrix layout. (VS6) f(g(u)) = f(g) g g(u) u u (chain rule) with consistent matrix layout. Leow Wee Kheng (NUS) Matrix Differentiation 12 / 35
Derivatives of Matrix by Scalar Derivatives of Matrix by Scalar (MS1) (MS2) au = a U where a is not a function of x. AUB = A U B where A and B are not functions of x. (MS3) (U+V) = U + V (MS4) UV = U V + U V (product rule) Leow Wee Kheng (NUS) Matrix Differentiation 13 / 35
Derivatives of Scalar by Vector Derivatives of Scalar by Vector (SV1) au = a u where a is not a function of x. (SV2) (u+v) = u + v (SV3) (SV4) uv = u v +v u g(u) = g(u) u u (product rule) (chain rule) (SV5) f(g(u)) = f(g) g g(u) u u (chain rule) Leow Wee Kheng (NUS) Matrix Differentiation 14 / 35
Derivatives of Scalar by Vector (SV6) (SV7) u v = u v +v u where u u Av where u and v = u A v +v A u and v and A is not a function of x. (product rule) are in numerator layout. (product rule) are in numerator layout, Leow Wee Kheng (NUS) Matrix Differentiation 15 / 35
Derivatives of Scalar by Matrix Derivatives of Scalar by Matrix (SM1) (SM2) au = a u where a is not a function of X. (u+v) = u + v (SM3) (SM4) uv = u v u +v g(u) = g(u) u u (product rule) (chain rule) (SM5) f(g(u)) = f(g) g g(u) u u (chain rule) Leow Wee Kheng (NUS) Matrix Differentiation 16 / 35
Derivatives of Vector by Vector Derivatives of Vector by Vector (VV1) au = a u +u a (product rule) (VV2) Au = A u where A is not a function of x. (VV3) (u+v) = u + v (VV4) g(u) = g(u) u u (chain rule) (VV5) f(g(u)) = f(g) g g(u) u u (chain rule) Leow Wee Kheng (NUS) Matrix Differentiation 17 / 35
Notes on Denominator Layout Notes on Denominator Layout In some cases, the results of denominator layout are the transpose of those of numerator layout. Moreover, the chain rule for denominator layout goes from right to left instead of left to right. Numerator Layout Notation Denominator Layout Notation (C7) (C11) (VV5) dx Ax dx f(g(u)) da x dx = a = x (A+A ) = f(g) g g(u) u u dx Ax dx f(g(u)) da x dx = a = (A+A )x = u g(u) f(g) u g Leow Wee Kheng (NUS) Matrix Differentiation 18 / 35
Derivations of Derivatives Derivations of Derivatives (C6) da x dx = dx a dx = a (The not-so-hard way) Let s = a x = a 1 x 1 + +a n x n. Then, (The easier way) Let s = a x = i a i x i. Then, s i = a i. So, ds dx = a. s i = a i. So, ds dx = a. (C7) dx x dx = 2x Let s = x x = i x 2 i. Then, s = 2x i. So, ds i dx = 2x. Leow Wee Kheng (NUS) Matrix Differentiation 19 / 35
Derivations of Derivatives (C8) d(x a) 2 dx = 2x aa Let s = x a. Then, s2 i = 2s s i = 2sa i. So, ds2 dx = 2x aa. (C9) dax dx = A (The hard way) a 11 a 1n Ax =..... a n1 a nn x 1. x n = a 11 x 1 + +a 1n x n. a n1 x 1 + +a nn x n. (The easy way) Let s = Ax. Then, s i = j a ij x j, and s i j = a ij. So, ds dx = A. Leow Wee Kheng (NUS) Matrix Differentiation 20 / 35
Derivations of Derivatives (C10) dx A dx = A Let y = x A, and a j denote the j-th column of A. Then, y i = x a j. Applying (C6) yields dy i dx = a dy j. So, dx = A. (C11) dx Ax dx = x (A+A ) Apply (SV6) to dx Ax dx and obtain x dax dx +(Ax) dx dx, Next, apply (C9) to the first part of the sum, and obtain x A+(Ax), which is x (A+A ). (Need to prove SV6 Homework.) Leow Wee Kheng (NUS) Matrix Differentiation 21 / 35
Derivatives of Trace Derivatives of Trace For variable matrix X and constant matrices A, B, C, (DT1) tr(x) = I (DT2) tr(x k ) = kxk 1 (DT3) tr(ax) = tr(xa) = A (DT4) tr(ax ) = tr(x A) = A (DT5) tr(x AX) = X (A+A ) Leow Wee Kheng (NUS) Matrix Differentiation 22 / 35
Derivatives of Trace (DT6) tr(x 1 A) = X AX (DT7) tr(axb) = tr(bax) = BA (DT8) tr(axbx C) = BX CA+B X A C Leow Wee Kheng (NUS) Matrix Differentiation 23 / 35
Derivatives of Determinant Derivatives of Determinant For variable matrix X and constant matrices A, B, (DD1) X = X X 1 (DD2) (DD3) X k = k Xk X 1 log X = X 1 (DD4) AXB = AXB X 1 (DD5) X AX = 2 X AX X 1 Leow Wee Kheng (NUS) Matrix Differentiation 24 / 35
Derivations of Derivatives Derivations of Derivatives (DD1) X = X X 1 For an n n matrix X, Laplace s formula expresses X as X = n ( 1) i+j x ij M ij, i=1 where M ij is a minor, which is the determinant of the sub-matrix of X obtained by removing the ith row and jth column. The adjugate adj(x) is the transpose of the matrix consisting of the cofactors ( 1) i+j M ij : adj(x) ji = ( 1) i+j M ij, which is independent of x ij, element (i,j) of X. Leow Wee Kheng (NUS) Matrix Differentiation 25 / 35
Derivations of Derivatives So, Then, and, in numerator layout, X = n x ij adj(x) ji. i=1 X and adj(x) are also related by X ij = adj(x) ji, X = adj(x). X I = adj(x)x. So, X = X X 1. Leow Wee Kheng (NUS) Matrix Differentiation 26 / 35
Derivations of Derivatives X k (DD2) = k Xk X 1 Note that X k = X k. So, X k = X k = k X k 1 X X 1 = k X k X 1. (DD3) log X = X 1 log X = 1 X X = X 1. AXB (DD4) = AXB X 1 Note that AXB = A X B. So, AXB = A X X 1 B = AXB X 1. Leow Wee Kheng (NUS) Matrix Differentiation 27 / 35
Derivations of Derivatives (DD5) X AX = 2 X AX X 1 Note that X = X. So, X AX = X X 1 A X + X A X X 1 = 2 X AX X 1. Leow Wee Kheng (NUS) Matrix Differentiation 28 / 35
Derivations of Derivatives (DT1) tr(x) = I For an n n matrix X, Then, So, tr(x) = n x ii. i=1 { tr(x) 1 if i = j, = ij 0 if i j. tr(x) = I. Leow Wee Kheng (NUS) Matrix Differentiation 29 / 35
Derivations of Derivatives (DT2) tr(x k ) ij tr(x k ) = kxk 1 So, in numerator layout, (X k ) ij = l 1 x il1 x l1 l 2 x lk 1 j l k 1 tr(x k ) = x l0 l 1 x l1 l 2 x lk 1 l 0 l 0 l 1 l k 1 = k 1 factors {}}{ x jl2 x l2 l 3 x lk 1 i+ (l 0 = i,l 1 = j) l 2 l k 1 + (k 2 terms) x jl1 x l1 l 2 x lk 2 i (l k 1 = i,l 0 = j) l k 2 l tr(x k ) = kxk 1. Leow Wee Kheng (NUS) Matrix Differentiation 30 / 35
Derivations of Derivatives (DT3) tr(ax) = tr(xa) = A (AX) ij = k tr(ax) = l tr(ax) = a ji, ij a ik x kj a lk x kl k So, in numerator layout, tr(ax) = A. Caution: tr(ab) tr(a) tr(b). Leow Wee Kheng (NUS) Matrix Differentiation 31 / 35
Derivations of Derivatives (DT4) tr(ax ) = tr(x A) Similar to the derivation for DT3, = A (AX ) ij = k a ik x jk tr(ax) = l tr(ax ) ij = a ij a lk x lk k So, in numerator layout, tr(ax ) = A. Leow Wee Kheng (NUS) Matrix Differentiation 32 / 35
Derivations of Derivatives (DT5) tr(x AX) = X (A+A ) (X AX) ij = k tr(x AX) = r tr(x AX) ij = l x ki a kl x lj l x kr a kl x lr k l a il x lj + k x kj a ki = k x kj a ki + l x lj a il. So, in numerator layout, tr(x AX) = X A+X A = X (A+A ). Leow Wee Kheng (NUS) Matrix Differentiation 33 / 35
Derivations of Derivatives (DT7) DT7 is a direct result of DT3. tr(axbx C) (DT8) = BX CA+B X A C (AXBX C) ij = a ip x pq b qr x sr c sj p q r s tr(axbx C) = a kp x pq b qr x sr c sk. k p q r s tr(axbx C) = a ki b jr x sr c sk + a kp x pq b qj c ik ij k r s k p q = b jr x sr c sk a ki + b qj x pq a kp c ik k r s k p q So, in numerator layout, tr(axbx C) = BX CA+B X A C. Leow Wee Kheng (NUS) Matrix Differentiation 34 / 35
References References 1. J. E. Gentle, Matrix Algebra: Theory, Computations, and Applications in Statistics, Springer, 2007. 2. H. Lütkepohl, Handbook of Matrices, John Wiley & Sons, 1996. 3. K. B. Petersen and M. S. Pedersen, The Matrix Cookbook, 2012. www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf 4. Wikipedia, Matrix Calculus. en.wikipedia.org/wiki/matrix_calculus Leow Wee Kheng (NUS) Matrix Differentiation 35 / 35