Matrix Differentiation

Similar documents
Math 4377/6308 Advanced Linear Algebra

Matrices and Determinants

Linear Algebra Solutions 1

Phys 201. Matrices and Determinants

ICS 6N Computational Linear Algebra Matrix Algebra

MATH 431: FIRST MIDTERM. Thursday, October 3, 2013.

Chapter 4 - MATRIX ALGEBRA. ... a 2j... a 2n. a i1 a i2... a ij... a in

Matrices A brief introduction

On Expected Gaussian Random Determinants

Kevin James. MTHSC 3110 Section 2.1 Matrix Operations

Announcements Wednesday, October 10

Announcements Monday, October 02

MA 1B ANALYTIC - HOMEWORK SET 7 SOLUTIONS

Cartesian Tensors. e 2. e 1. General vector (formal definition to follow) denoted by components

Linear Equations and Matrix

Chapter 1 Matrices and Systems of Equations

CS 246 Review of Linear Algebra 01/17/19

Matrix Arithmetic. j=1

Exercise Set Suppose that A, B, C, D, and E are matrices with the following sizes: A B C D E

Fundamentals of Engineering Analysis (650163)

6.1 Matrices. Definition: A Matrix A is a rectangular array of the form. A 11 A 12 A 1n A 21. A 2n. A m1 A m2 A mn A 22.

II. Determinant Functions

TOPIC III LINEAR ALGEBRA

Introduction. Vectors and Matrices. Vectors [1] Vectors [2]

. D Matrix Calculus D 1

2. Linear algebra. matrices and vectors. linear equations. range and nullspace of matrices. function of vectors, gradient and Hessian

Massachusetts Institute of Technology Department of Economics Statistics. Lecture Notes on Matrix Algebra

Principal Component Analysis

Matrices: 2.1 Operations with Matrices

Lecture 4: Products of Matrices

Math Bootcamp An p-dimensional vector is p numbers put together. Written as. x 1 x =. x p

Chapter 5: Matrices. Daniel Chan. Semester UNSW. Daniel Chan (UNSW) Chapter 5: Matrices Semester / 33

Elementary Row Operations on Matrices

ELEMENTARY LINEAR ALGEBRA

Matrix Representation

MATRICES AND MATRIX OPERATIONS

2 Constructions of manifolds. (Solutions)

Mathematical Methods wk 2: Linear Operators

MATRIX ALGEBRA. or x = (x 1,..., x n ) R n. y 1 y 2. x 2. x m. y m. y = cos θ 1 = x 1 L x. sin θ 1 = x 2. cos θ 2 = y 1 L y.

Linear Systems and Matrices

EA = I 3 = E = i=1, i k

Section 9.2: Matrices.. a m1 a m2 a mn

Elementary maths for GMT

Introduction to Matrices

1 Matrices and Systems of Linear Equations

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

CP3 REVISION LECTURES VECTORS AND MATRICES Lecture 1. Prof. N. Harnew University of Oxford TT 2013

Chapter 1: Systems of linear equations and matrices. Section 1.1: Introduction to systems of linear equations

Math113: Linear Algebra. Beifang Chen

Rank, Trace, Determinant, Transpose an Inverse of a Matrix Let A be an n n square matrix: A = a11 a1 a1n a1 a an a n1 a n a nn nn where is the jth col

3 (Maths) Linear Algebra

Computational Foundations of Cognitive Science. Addition and Scalar Multiplication. Matrix Multiplication

Introduction to Matrix Algebra

Matrix Algebra, Class Notes (part 2) by Hrishikesh D. Vinod Copyright 1998 by Prof. H. D. Vinod, Fordham University, New York. All rights reserved.

1 Matrices and Systems of Linear Equations. a 1n a 2n

. a m1 a mn. a 1 a 2 a = a n

Bare minimum on matrix algebra. Psychology 588: Covariance structure and factor models

A matrix over a field F is a rectangular array of elements from F. The symbol

10. Linear Systems of ODEs, Matrix multiplication, superposition principle (parts of sections )

Section 9.2: Matrices. Definition: A matrix A consists of a rectangular array of numbers, or elements, arranged in m rows and n columns.

Linear Algebra Review. Vectors

Jim Lambers MAT 610 Summer Session Lecture 1 Notes

Lecture 3 Linear Algebra Background

William Stallings Copyright 2010

M. Matrices and Linear Algebra

A FIRST COURSE IN LINEAR ALGEBRA. An Open Text by Ken Kuttler. Matrix Arithmetic

MATRICES. a m,1 a m,n A =

ELEMENTARY LINEAR ALGEBRA

Linear Algebra Primer

Matrix operations Linear Algebra with Computer Science Application

Linear Algebra and Matrix Inversion

Matrix Algebra Determinant, Inverse matrix. Matrices. A. Fabretti. Mathematics 2 A.Y. 2015/2016. A. Fabretti Matrices

Matrix & Linear Algebra

Linear Algebra. Linear Equations and Matrices. Copyright 2005, W.R. Winfrey

Determinants. Samy Tindel. Purdue University. Differential equations and linear algebra - MA 262

Matrices A brief introduction

STAT200C: Review of Linear Algebra

Mathematics 13: Lecture 10

Matrix Algebra: Summary

Lecture Notes in Linear Algebra

Matrices. Math 240 Calculus III. Wednesday, July 10, Summer 2013, Session II. Matrices. Math 240. Definitions and Notation.

CS100: DISCRETE STRUCTURES. Lecture 3 Matrices Ch 3 Pages:

I = i 0,

Equality: Two matrices A and B are equal, i.e., A = B if A and B have the same order and the entries of A and B are the same.

7.5 Operations with Matrices. Copyright Cengage Learning. All rights reserved.

Matrix Arithmetic. a 11 a. A + B = + a m1 a mn. + b. a 11 + b 11 a 1n + b 1n = a m1. b m1 b mn. and scalar multiplication for matrices via.

Introduction to Quantitative Techniques for MSc Programmes SCHOOL OF ECONOMICS, MATHEMATICS AND STATISTICS MALET STREET LONDON WC1E 7HX

Materials engineering Collage \\ Ceramic & construction materials department Numerical Analysis \\Third stage by \\ Dalya Hekmat

ELEMENTARY LINEAR ALGEBRA

7 Matrix Operations. 7.0 Matrix Multiplication + 3 = 3 = 4

Math Camp II. Basic Linear Algebra. Yiqing Xu. Aug 26, 2014 MIT

2 b 3 b 4. c c 2 c 3 c 4

Tensor Analysis in Euclidean Space

Linear Algebra: Lecture notes from Kolman and Hill 9th edition.

Computational Linear Algebra

Determinants - Uniqueness and Properties

MAT 2037 LINEAR ALGEBRA I web:

Math Linear Algebra Final Exam Review Sheet

Matrix Algebra. Matrix Algebra. Chapter 8 - S&B

Lecture 9: Submatrices and Some Special Matrices

Transcription:

Matrix Differentiation CS5240 Theoretical Foundations in Multimedia Leow Wee Kheng Department of Computer Science School of Computing National University of Singapore Leow Wee Kheng (NUS) Matrix Differentiation 1 / 35

Matrix Derivatives There are 6 common types of matrix derivatives: Type Scalar Vector Matrix Scalar Y Vector Matrix Leow Wee Kheng (NUS) Matrix Differentiation 2 / 35

Derivatives by Scalar Numerator Layout Notation Y = = 11. m1 1. m 1n.... mn Denominator Layout Notation [ ] = 1 m Leow Wee Kheng (NUS) Matrix Differentiation 3 / 35

Derivatives by Vector Numerator Layout Notation [ = 1 = ] n 1 1 1 n..... m m 1 n Denominator Layout Notation = 1. n 1 m = 1 1..... 1 m n n Leow Wee Kheng (NUS) Matrix Differentiation 4 / 35

Derivative by Matrix Numerator Layout Notation = 11 m1..... 1n mn Denominator Layout Notation = 11 1n..... m1 mn Leow Wee Kheng (NUS) Matrix Differentiation 5 / 35

Pictorial Representation...... numerator layout denominator layout.. Leow Wee Kheng (NUS) Matrix Differentiation 6 / 35

Caution Most books and papers don t state which convention they use. Reference [2] uses both conventions but clearly differentiate them. [ ] = 1 n = 1. n 1 1 1 m = 1 n..... m m = 1 1..... 1 m 1 n n n It is best not to mix the two conventions in your equations. We adopt numerator layout notation. Leow Wee Kheng (NUS) Matrix Differentiation 7 / 35

Commonly Used Derivatives Commonly Used Derivatives Here, scalar a, vector a and matrix A are not functions of x and x. da (C1) = 0 (column matrix) dx (C2) (C3) (C4) (C5) da dx = 0 (row matrix) da dx = 0 (matrix) da dx = 0 (matrix) dx dx = I Leow Wee Kheng (NUS) Matrix Differentiation 8 / 35

Commonly Used Derivatives (C6) (C7) da x dx = dx a dx = a dx x dx = 2x (C8) d(x a) 2 dx = 2x aa (C9) (C10) dax dx = A dx A dx = A (C11) dx Ax dx = x (A+A ) Leow Wee Kheng (NUS) Matrix Differentiation 9 / 35

Derivatives of Scalar by Scalar Derivatives of Scalar by Scalar (SS1) (u+v) = u + v (SS2) uv = u v +v u (product rule) (SS3) g(u) = g(u) u u (chain rule) (SS4) f(g(u)) = f(g) g g(u) u u (chain rule) Leow Wee Kheng (NUS) Matrix Differentiation 10 / 35

Derivatives of Vector by Scalar Derivatives of Vector by Scalar (VS1) (VS2) (VS3) au = a u where a is not a function of x. Au = A u where A is not a function of x. u = ( ) u (VS4) (u+v) = u + v Leow Wee Kheng (NUS) Matrix Differentiation 11 / 35

Derivatives of Vector by Scalar (VS5) g(u) = g(u) u u (chain rule) with consistent matrix layout. (VS6) f(g(u)) = f(g) g g(u) u u (chain rule) with consistent matrix layout. Leow Wee Kheng (NUS) Matrix Differentiation 12 / 35

Derivatives of Matrix by Scalar Derivatives of Matrix by Scalar (MS1) (MS2) au = a U where a is not a function of x. AUB = A U B where A and B are not functions of x. (MS3) (U+V) = U + V (MS4) UV = U V + U V (product rule) Leow Wee Kheng (NUS) Matrix Differentiation 13 / 35

Derivatives of Scalar by Vector Derivatives of Scalar by Vector (SV1) au = a u where a is not a function of x. (SV2) (u+v) = u + v (SV3) (SV4) uv = u v +v u g(u) = g(u) u u (product rule) (chain rule) (SV5) f(g(u)) = f(g) g g(u) u u (chain rule) Leow Wee Kheng (NUS) Matrix Differentiation 14 / 35

Derivatives of Scalar by Vector (SV6) (SV7) u v = u v +v u where u u Av where u and v = u A v +v A u and v and A is not a function of x. (product rule) are in numerator layout. (product rule) are in numerator layout, Leow Wee Kheng (NUS) Matrix Differentiation 15 / 35

Derivatives of Scalar by Matrix Derivatives of Scalar by Matrix (SM1) (SM2) au = a u where a is not a function of X. (u+v) = u + v (SM3) (SM4) uv = u v u +v g(u) = g(u) u u (product rule) (chain rule) (SM5) f(g(u)) = f(g) g g(u) u u (chain rule) Leow Wee Kheng (NUS) Matrix Differentiation 16 / 35

Derivatives of Vector by Vector Derivatives of Vector by Vector (VV1) au = a u +u a (product rule) (VV2) Au = A u where A is not a function of x. (VV3) (u+v) = u + v (VV4) g(u) = g(u) u u (chain rule) (VV5) f(g(u)) = f(g) g g(u) u u (chain rule) Leow Wee Kheng (NUS) Matrix Differentiation 17 / 35

Notes on Denominator Layout Notes on Denominator Layout In some cases, the results of denominator layout are the transpose of those of numerator layout. Moreover, the chain rule for denominator layout goes from right to left instead of left to right. Numerator Layout Notation Denominator Layout Notation (C7) (C11) (VV5) dx Ax dx f(g(u)) da x dx = a = x (A+A ) = f(g) g g(u) u u dx Ax dx f(g(u)) da x dx = a = (A+A )x = u g(u) f(g) u g Leow Wee Kheng (NUS) Matrix Differentiation 18 / 35

Derivations of Derivatives Derivations of Derivatives (C6) da x dx = dx a dx = a (The not-so-hard way) Let s = a x = a 1 x 1 + +a n x n. Then, (The easier way) Let s = a x = i a i x i. Then, s i = a i. So, ds dx = a. s i = a i. So, ds dx = a. (C7) dx x dx = 2x Let s = x x = i x 2 i. Then, s = 2x i. So, ds i dx = 2x. Leow Wee Kheng (NUS) Matrix Differentiation 19 / 35

Derivations of Derivatives (C8) d(x a) 2 dx = 2x aa Let s = x a. Then, s2 i = 2s s i = 2sa i. So, ds2 dx = 2x aa. (C9) dax dx = A (The hard way) a 11 a 1n Ax =..... a n1 a nn x 1. x n = a 11 x 1 + +a 1n x n. a n1 x 1 + +a nn x n. (The easy way) Let s = Ax. Then, s i = j a ij x j, and s i j = a ij. So, ds dx = A. Leow Wee Kheng (NUS) Matrix Differentiation 20 / 35

Derivations of Derivatives (C10) dx A dx = A Let y = x A, and a j denote the j-th column of A. Then, y i = x a j. Applying (C6) yields dy i dx = a dy j. So, dx = A. (C11) dx Ax dx = x (A+A ) Apply (SV6) to dx Ax dx and obtain x dax dx +(Ax) dx dx, Next, apply (C9) to the first part of the sum, and obtain x A+(Ax), which is x (A+A ). (Need to prove SV6 Homework.) Leow Wee Kheng (NUS) Matrix Differentiation 21 / 35

Derivatives of Trace Derivatives of Trace For variable matrix X and constant matrices A, B, C, (DT1) tr(x) = I (DT2) tr(x k ) = kxk 1 (DT3) tr(ax) = tr(xa) = A (DT4) tr(ax ) = tr(x A) = A (DT5) tr(x AX) = X (A+A ) Leow Wee Kheng (NUS) Matrix Differentiation 22 / 35

Derivatives of Trace (DT6) tr(x 1 A) = X AX (DT7) tr(axb) = tr(bax) = BA (DT8) tr(axbx C) = BX CA+B X A C Leow Wee Kheng (NUS) Matrix Differentiation 23 / 35

Derivatives of Determinant Derivatives of Determinant For variable matrix X and constant matrices A, B, (DD1) X = X X 1 (DD2) (DD3) X k = k Xk X 1 log X = X 1 (DD4) AXB = AXB X 1 (DD5) X AX = 2 X AX X 1 Leow Wee Kheng (NUS) Matrix Differentiation 24 / 35

Derivations of Derivatives Derivations of Derivatives (DD1) X = X X 1 For an n n matrix X, Laplace s formula expresses X as X = n ( 1) i+j x ij M ij, i=1 where M ij is a minor, which is the determinant of the sub-matrix of X obtained by removing the ith row and jth column. The adjugate adj(x) is the transpose of the matrix consisting of the cofactors ( 1) i+j M ij : adj(x) ji = ( 1) i+j M ij, which is independent of x ij, element (i,j) of X. Leow Wee Kheng (NUS) Matrix Differentiation 25 / 35

Derivations of Derivatives So, Then, and, in numerator layout, X = n x ij adj(x) ji. i=1 X and adj(x) are also related by X ij = adj(x) ji, X = adj(x). X I = adj(x)x. So, X = X X 1. Leow Wee Kheng (NUS) Matrix Differentiation 26 / 35

Derivations of Derivatives X k (DD2) = k Xk X 1 Note that X k = X k. So, X k = X k = k X k 1 X X 1 = k X k X 1. (DD3) log X = X 1 log X = 1 X X = X 1. AXB (DD4) = AXB X 1 Note that AXB = A X B. So, AXB = A X X 1 B = AXB X 1. Leow Wee Kheng (NUS) Matrix Differentiation 27 / 35

Derivations of Derivatives (DD5) X AX = 2 X AX X 1 Note that X = X. So, X AX = X X 1 A X + X A X X 1 = 2 X AX X 1. Leow Wee Kheng (NUS) Matrix Differentiation 28 / 35

Derivations of Derivatives (DT1) tr(x) = I For an n n matrix X, Then, So, tr(x) = n x ii. i=1 { tr(x) 1 if i = j, = ij 0 if i j. tr(x) = I. Leow Wee Kheng (NUS) Matrix Differentiation 29 / 35

Derivations of Derivatives (DT2) tr(x k ) ij tr(x k ) = kxk 1 So, in numerator layout, (X k ) ij = l 1 x il1 x l1 l 2 x lk 1 j l k 1 tr(x k ) = x l0 l 1 x l1 l 2 x lk 1 l 0 l 0 l 1 l k 1 = k 1 factors {}}{ x jl2 x l2 l 3 x lk 1 i+ (l 0 = i,l 1 = j) l 2 l k 1 + (k 2 terms) x jl1 x l1 l 2 x lk 2 i (l k 1 = i,l 0 = j) l k 2 l tr(x k ) = kxk 1. Leow Wee Kheng (NUS) Matrix Differentiation 30 / 35

Derivations of Derivatives (DT3) tr(ax) = tr(xa) = A (AX) ij = k tr(ax) = l tr(ax) = a ji, ij a ik x kj a lk x kl k So, in numerator layout, tr(ax) = A. Caution: tr(ab) tr(a) tr(b). Leow Wee Kheng (NUS) Matrix Differentiation 31 / 35

Derivations of Derivatives (DT4) tr(ax ) = tr(x A) Similar to the derivation for DT3, = A (AX ) ij = k a ik x jk tr(ax) = l tr(ax ) ij = a ij a lk x lk k So, in numerator layout, tr(ax ) = A. Leow Wee Kheng (NUS) Matrix Differentiation 32 / 35

Derivations of Derivatives (DT5) tr(x AX) = X (A+A ) (X AX) ij = k tr(x AX) = r tr(x AX) ij = l x ki a kl x lj l x kr a kl x lr k l a il x lj + k x kj a ki = k x kj a ki + l x lj a il. So, in numerator layout, tr(x AX) = X A+X A = X (A+A ). Leow Wee Kheng (NUS) Matrix Differentiation 33 / 35

Derivations of Derivatives (DT7) DT7 is a direct result of DT3. tr(axbx C) (DT8) = BX CA+B X A C (AXBX C) ij = a ip x pq b qr x sr c sj p q r s tr(axbx C) = a kp x pq b qr x sr c sk. k p q r s tr(axbx C) = a ki b jr x sr c sk + a kp x pq b qj c ik ij k r s k p q = b jr x sr c sk a ki + b qj x pq a kp c ik k r s k p q So, in numerator layout, tr(axbx C) = BX CA+B X A C. Leow Wee Kheng (NUS) Matrix Differentiation 34 / 35

References References 1. J. E. Gentle, Matrix Algebra: Theory, Computations, and Applications in Statistics, Springer, 2007. 2. H. Lütkepohl, Handbook of Matrices, John Wiley & Sons, 1996. 3. K. B. Petersen and M. S. Pedersen, The Matrix Cookbook, 2012. www.math.uwaterloo.ca/~hwolkowi/matrixcookbook.pdf 4. Wikipedia, Matrix Calculus. en.wikipedia.org/wiki/matrix_calculus Leow Wee Kheng (NUS) Matrix Differentiation 35 / 35