Matrix Recipes. Javier R. Movellan. December 28, Copyright c 2004 Javier R. Movellan

Similar documents
Euler equations for multiple integrals

The derivative of a function f(x) is another function, defined in terms of a limiting expression: f(x + δx) f(x)

Rank, Trace, Determinant, Transpose an Inverse of a Matrix Let A be an n n square matrix: A = a11 a1 a1n a1 a an a n1 a n a nn nn where is the jth col

Pure Further Mathematics 1. Revision Notes

Short Review of Basic Mathematics

Math Notes on differentials, the Chain Rule, gradients, directional derivative, and normal vectors

164 Final Solutions 1

Diagonalization of Matrices Dr. E. Jacobs

KRONECKER PRODUCT AND LINEAR MATRIX EQUATIONS

Math Review for Physical Chemistry

Tensors, Fields Pt. 1 and the Lie Bracket Pt. 1

ICS 6N Computational Linear Algebra Matrix Algebra

Problem set 2: Solutions Math 207B, Winter 2016

Problem Sheet 2: Eigenvalues and eigenvectors and their use in solving linear ODEs

QF101: Quantitative Finance September 5, Week 3: Derivatives. Facilitator: Christopher Ting AY 2017/2018. f ( x + ) f(x) f(x) = lim

Applications of the Wronskian to ordinary linear differential equations

ELEC3114 Control Systems 1

Mathematical Review Problems

Conservation Laws. Chapter Conservation of Energy

MAT 2037 LINEAR ALGEBRA I web:

SYSTEMS OF DIFFERENTIAL EQUATIONS, EULER S FORMULA. where L is some constant, usually called the Lipschitz constant. An example is

1 Lecture 20: Implicit differentiation

Diophantine Approximations: Examining the Farey Process and its Method on Producing Best Approximations

The numbers inside a matrix are called the elements or entries of the matrix.

Final Exam Study Guide and Practice Problems Solutions

Chapter 2 Lagrangian Modeling

Matrices MA1S1. Tristan McLoughlin. November 9, Anton & Rorres: Ch

Linear First-Order Equations

Introduction to Machine Learning

Function Spaces. 1 Hilbert Spaces

Notes on Lie Groups, Lie algebras, and the Exponentiation Map Mitchell Faulk

d dx [xn ] = nx n 1. (1) dy dx = 4x4 1 = 4x 3. Theorem 1.3 (Derivative of a constant function). If f(x) = k and k is a constant, then f (x) = 0.

Free rotation of a rigid body 1 D. E. Soper 2 University of Oregon Physics 611, Theoretical Mechanics 5 November 2012

First Order Linear Differential Equations

Linear Algebra and Matrix Inversion

Table of Common Derivatives By David Abraham

7.1 Support Vector Machine

Differentiation ( , 9.5)

Implicit Differentiation

Section The Chain Rule and Implicit Differentiation with Application on Derivative of Logarithm Functions

Optimization of Geometries by Energy Minimization

Acute sets in Euclidean spaces

II. First variation of functionals

Least-Squares Regression on Sparse Spaces

LATTICE-BASED D-OPTIMUM DESIGN FOR FOURIER REGRESSION

Schrödinger s equation.

State observers and recursive filters in classical feedback control theory

Knowledge Discovery and Data Mining 1 (VO) ( )

Calculus and optimization

Lectures - Week 10 Introduction to Ordinary Differential Equations (ODES) First Order Linear ODEs

model considered before, but the prey obey logistic growth in the absence of predators. In

Matrix operations Linear Algebra with Computer Science Application

Some vector algebra and the generalized chain rule Ross Bannister Data Assimilation Research Centre, University of Reading, UK Last updated 10/06/10

Designing Information Devices and Systems II Spring 2018 J. Roychowdhury and M. Maharbiz Discussion 2A

Mathematical Foundations of Applied Statistics: Matrix Algebra

Table of Contents Derivatives of Logarithms

Optimization Notes. Note: Any material in red you will need to have memorized verbatim (more or less) for tests, quizzes, and the final exam.

Calculus I Sec 2 Practice Test Problems for Chapter 4 Page 1 of 10

An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback

Math 32A Review Sheet

Dynamical Systems and a Brief Introduction to Ergodic Theory

Matrix Arithmetic. a 11 a. A + B = + a m1 a mn. + b. a 11 + b 11 a 1n + b 1n = a m1. b m1 b mn. and scalar multiplication for matrices via.

Homework 2 Solutions EM, Mixture Models, PCA, Dualitys

IMPLICIT DIFFERENTIATION

JUST THE MATHS UNIT NUMBER DIFFERENTIATION 2 (Rates of change) A.J.Hobson

Vectors in two dimensions

Linear Algebra- Review And Beyond. Lecture 3

Text S1: Simulation models and detailed method for early warning signal calculation

TMA 4195 Matematisk modellering Exam Tuesday December 16, :00 13:00 Problems and solution with additional comments

Some functions and their derivatives

Slide10 Haykin Chapter 14: Neurodynamics (3rd Ed. Chapter 13)

19 Eigenvalues, Eigenvectors, Ordinary Differential Equations, and Control

CS281A/Stat241A Lecture 17

Chapter 4. Matrices and Matrix Rings

PDE Notes, Lecture #11

arxiv: v1 [cs.lg] 22 Mar 2014

Left-invariant extended Kalman filter and attitude estimation

1 Math 285 Homework Problem List for S2016

u!i = a T u = 0. Then S satisfies

Introduction to Markov Processes

Year 11 Matrices Semester 2. Yuk

Exam 2 Review Solutions

OPG S. LIST OF FORMULAE [ For Class XII ] OP GUPTA. Electronics & Communications Engineering. Indira Award Winner

A Note on Exact Solutions to Linear Differential Equations by the Matrix Exponential

Agmon Kolmogorov Inequalities on l 2 (Z d )

APPPHYS 217 Thursday 8 April 2010

Calculus of Variations

MA 2232 Lecture 08 - Review of Log and Exponential Functions and Exponential Growth

Jointly continuous distributions and the multivariate Normal

Lagrangian and Hamiltonian Mechanics

Robust Forward Algorithms via PAC-Bayes and Laplace Distributions. ω Q. Pr (y(ω x) < 0) = Pr A k

Lecture Introduction. 2 Examples of Measure Concentration. 3 The Johnson-Lindenstrauss Lemma. CS-621 Theory Gems November 28, 2012

2.20 Marine Hydrodynamics Lecture 3

CS 246 Review of Linear Algebra 01/17/19

Summary: Differentiation

Electricity and Magnetism Computer Lab #1: Vector algebra and calculus

Implicit Differentiation. Lecture 16.

Ordinary Differential Equations

Characterizing Real-Valued Multivariate Complex Polynomials and Their Symmetric Tensor Representations

Euler Equations: derivation, basic invariants and formulae

Transcription:

Matrix Recipes Javier R Movellan December 28, 2006 Copyright c 2004 Javier R Movellan 1

1 Definitions Let x, y be matrices of orer m n an o p respectively, ie x 11 x 1n y 11 y 1p x = y = (1) x m1 x mn y o1 y op We refer to a matrix of orer 1 1 as a scalar, an a matrix of orer m 1 where m > 1, as a vector We let f be a generic function of the form y = f(x) If x is a vector an y is a scalar we say that f is a scalar function of a vector If x is a matrix an y a vector, we say that f is a vector function of a matrix, an so on General Definition: Derivative of a matrix function of a matrix x 11 x 1n ef = x (2) x m1 where ef = x ij 11 x mn 1q x ij x ij p1 x ij pq x ij (3) Thus the erivative of a p q matrix y with respect to a m n matrix x is a (mp) (nq) matrix of the following form 11 x 11 11 1q x 1n x 11 1q x 1n 11 x m1 11 1q x mn x m1 1q x mn ef = x (4) p1 x 11 p1 pq x 1n x 11 pq x 1n p1 x m1 pq x m1 p1 x mn pq x mn From this efinitions we can erive a variety of important special cases Derivative of a scalar function of a vector We think of a vector as a matrix with a single column an a scalar as a matrix with a single cell Applying 2

the general efinition we get that ef = x x 1 x m (5) Graient of a scalar function of a vector We represent graients using the sign For the case of graients of scalar functions of a vector, we efine the graient to equal the erivative (note this will not be the case for vector functions of vectors) x y ef = (6) x Derivative of a vector function of a vector If we think of y an x as p 1 an m 1 matrices, then the erivative woul be a vector with mp imensions ef = ( 11 p1 11 p1 ) T (7) x x 11 x 11 x 1m x 1m Jacobian of a vector function of a vector A more useful representation can be obtaine by working with the erivative of y with respect to the transpose of x This results on an p m matrix which is known as the Jacobian of y with respect to x 1 J x y ef = x x T = 1 1 x m (8) p x 1 p x m The Jacobian is very useful for linearizing vector functions of vectors f(x) f(x 0 ) + J x f(x x 0 ) (9) Graient of a vector function of a vector Note if y is a scalar then J x y = = ( x T x y) T ie, the Jacobian of a scalar function is the transpose of the graient With this in min we will efine the graient of a vector function of a vector as the transpose of the Jacobian ( ) T x y ef = (J x y) T = x T = T x (10) Hessian of a scalar function of a vector Let y = f(x), y R, x R n The matrix H x y ef = 2 xy ef = x x y ef = ( ) 2 y T x 1 x 1 2 y x 1 x n = x x (11) 2 y x n x 1 2 x n x n 3

is calle the Hessian matrix of y with respect to x 2 Summary of efinitions For x, y vectors Graient: Jacobian: Hessian: x y ef = T x (12) J x y ef = ( x y) T ef = x T (13) H x y ef = 2 x y ef = x x y (14) 3 Chain Rules 1 Vector functions of vectors: If z = h(y), is a vector function of a vector, an y = f(x) is a vector function of a vector then or equivalently J x z = (J y z)(j x y) (15) x z = ( x y)( y z) (16) Example 1: Consier the quaratic function z = y T ay (17) where y is an n imensional vector a is an n n matrix It is easy to show that x T ax = x j (a kj + a jk ) (18) x k j thus Moreover if J x x T ax = x T (a + a T ) (19) x x T ax = (a + a T )x (20) y = bx (21) where x is an m imensional vector an b is an n m matrix then Thus J x y = b (22) x y = b T (23) 4

J x (bx c) T a(bx c) = ( J bx c (bx c) T a(bx c) ) (J x (bx c)) (24) = (x c) T (a + a T )b (25) an x (x µ) T a(x µ) = b T (a + a T )(x c) T (26) Example 2: Let x, y vectors x e xt y y = x e xt y e x T ye xt y y (27) where x e xt y = x x T y x T y e xt y = ye xt y (28) e x T y e xt y y = e x T yye xt y = y T (29) Thus x e xt y y = y e xt y y T (30) 2 Scalar functions of matrices: If z = h(y) is a scalar function of a matrix an y = f(x) is a matrix function of a scalar then [ ( ) ] [ T ( ) ] T z x trace z z = trace (31) x x 5

4 Useful Formulae (I nee to check whether notation is consistent with other parts of the ocument) Let a, b, c, matrices, x, y, z vectors 41 Linear an Quaratic Functions x xt c = c (32) x cx = x (cx)t = c T (33) J x cx = ( x cx) T = c (34) x (ax + b)t c(x + e) = a T c(x + e) + T c T (ax + b), (35) x (ax + b)t c(ax + b) = a T (c + c T )(ax + b) (36) a (ax + b)t c(ax + b) = (c + c T )(ax + b)x T, (37) a xt a T bay = b T axy T + bayx T (38) a xt ay = xy T (39) a x e ta y = txe ta y (40) 6

42 Traces trace [a] = I a (41) b trace [abc] = b trace [ c T b T a T ] = a T c T (42) b trace [abn ] = ( n 1 ) T b i ab n i 1 (43) i=0 e trace [ abe T c ] = a T c T b T + caeb (44) b trace [ a T ba ] = (b + b T )a (45) a trace [ a 1 b ] = a 1 b T a 1 (46) 43 Determinants a et [a] = et [a] a T (47) 44 Kronecker an Vecs vec[abc] vec[b] 5 Matrix Differential Equations = c a T (48) (a t + b t ) = a t t t + b t t (a t b t ) = a t t t b b t t + a t t (a 2 t ) = a t t t a a t t + a t t (a 2 t ) = a t t t a a t t + a t t a t a 1 t = 0 = a t t t a 1 t a 1 t t + a 1 a t t t (49) (50) (51) (52) (53) = a 1 a t t t a 1 (54) (55) 7

It is only possible to expect e at t = a t e at uner conitions of full commutativity Types of linear matrix equations a t t = ba tc (56) (57) with special cases when b or c are the ientity matrix The following tricks allow moving the coefficients to the left, or right then a t t = a tc (58) a T t t = c T a T t (59) (60) an if then a 1 t t a t t = ba t (61) = a 1 a t t a 1 = a 1 ba t a 1 t = a 1 t b (62) (63) It can be shown that if a 0 is non-singular, then the solution a t is non-singular for all t 8

6 Determinants I + xy = 1 + x y, x, y R n e a = e trace(a) 7 Trace trace(a + b) = trace(a) + trace(b) trace(a) = trace(a T ) trace(abc) = trace(cab) = trace(bca) trace(xy T a) = y T ax, x R m, y R n trace(a T rb T ) = a T rb confirm r oes not nee rotation If a is m n an b is n m then trace(ab) = trace(ba) = trace(a T b T ) 8 Matrix Exponentials e a = e trace (a) if a = pλp 1 then e a = pe Λ p 1 eta t = ae ta a x e ta y = te ta xy 81 Proofs: x e ta = a ij δ x e t(a+δ1i1 j ) y ( ) = x e ta δ=0 δ etδ1i1 j y δ=0 = x e ta t1 i 1 je δ1i1 j y δ=0 = t(e ta x) i y j (64) Thus x e ta a = te ta xy (65) 9 Matrix Logarithms Let a a real or complex square matrix of oreer n with positive eigenvalues Then there is a unique matrix b such that (1) a = e b an (2) the imagignary part of the eigenvalues is in [ π, π] We call b the principle logarithm of a if a = pλp 1 then log(a) = plog(λ)p 1 9

10 Kronecker an Vec Provie a way to eal with erivatives of matrix functions without having to use cubix or quartix (ie, matrices with 3 or 4 imensions) Instea of working with matrix functions we work with vectorize versions of matrix functions This gives rise to the Kronecker prouct, or tensor prouct Definition: Kronecker prouct a 11 b a 1n b a b = (66) a m1 b a mn b Definition: Vec operator vec[a] = a 11 a m1 a 12 a m2 a 1n a mn (67) 101 Properties 1 a b c = (a b) c = a (b c) (68) provie the imensions of the matrices allows for all the expressions to exist 2 3 4 5 (a + b) (c + ) = a c + a + b c + b (69) (a b)(c ) = (ac) (b) (70) (a b)(b ) = (ac) (b) (71) (a b) T = a T b T (72) 10

6 7 8 (a b) 1 = a 1 b 1 (73) vec [ ab T ] = b a (74) vec [abc] = (c T a) vec [b] (75) 9 If {λ i, u i } are eigenvalues/eigenvectors of a an {δ i, v i } are eigenvalues/eigenvectors of b then {λ i δ j, u i v j } are eigenvalues/eigenvectors ofr a b 10 11 et[a b] = et[a] m et[b] n (76) where a, b are of orer n n an m m respectively trace(a b) = trace[a] trace[b] (77) 12 2 vecx(axbx T ) = 2 vecxvec(x T )(b T a)vecx = b a +b atrace(a b) = trace[a] trace[b] (78) where a, b, x are matrices 11 Optimization of Quaratic Functions This is arguably the most useful optimization problem in applie mathematics Its solution is behin a large variety of useful algorithms incluing Multivariate Linear Regression, the Kalman Filter, Linear Quaratic Controllers, etc Let ρ(x) = E(bx C) T a(bx C) + x T x (79) where a an are symmetric positive efinite matrices, b is a matrix, x a vector an C a ranom vector Taking the Jacobian with respect to x an applying the chain rule we have J x ρ = J bx c (bx c) T a(bx c) J x (bx c) + J x x T x (80) = 2(bx c) T ab + 2x T (81) x ρ = (J x ) T = 2b T a(bx c) + 2 x (82) Setting the graient to zero we get (b T ab + )x = b T ac (83) This is commonly known as the Normal Equation Thus the value ˆx that minimizes ρ is ˆx = hc (84) 11

where Moreover h = (b T ab + ) 1 b T a (85) ρ(ˆx) = (bhc c) T a(bhc c) + c T h T hc (86) = c T h T b T abhc 2c T h T b T ac + c T ac + c T h T hc (87) Now note Thus c T h T b T abhc + c T h T hc = c T h T (b T ab + )hc (88) where = c T a T b(b T ab + ) 1 (b T ab + )(b T ab + ) 1 b T ac (89) = c T a T b(b T ab + ) 1 b T ac (90) = c T h T b T ac (91) ρ(ˆx) = c T ac c T h T b T ac = c T kc (92) k = a h T b T a = a a T b(b T ab + ) 1 b T a (93) This is known as the Riccati Equation which is foun in a variety of stochastic filtering an control problems Example Application: Rige Regression Let y R n represents a set of observations on a variable we want to preict, b is an n p matrix, where each row is a set of observations on p variables use to preict c, an x R p are the weights given to each variable to preict y A useful measure of error is ρ(x) = (bx y) T (bx y) + λx T x (94) where λ 0 is a constant that penalizes for the use of large values of x Thus the solution to this problem is where I p is the p p ientity matrix 12 Optimization Methos 121 Newton-Raphson Metho ˆx = (b T b + λi p ) 1 b T by (95) Let y = f(x), for y R, x R n The Newton-Raphson algorithm is an iterative metho for optimizing y We start the process at an arbitrary point x 0 R n 12

Let x t R n represent the state of the algorithm at iteration t We approximate the function f using the linear an quaratic terms of the Taylor expansion of f aroun x t ˆf t (x) = f(x t ) + x f(x t )(x x t ) T + 1 2 (x x t) T ( x x f(x t )) (x x t ) (96) an then we then fin the extremum of ˆf t with respect to x an move irectly to that extremum To o so note that x ˆft (x) = x f(x t ) + ( x x f(x t )) (x x t ) (97) We let x(t + 1) be the value of x for which x ˆft (x) = 0 x t+1 = x t + ( x x f(x t )) 1 x f(x t ) (98) It is useful to compare the Newton-Raphson metho with the stanar metho of graient ascent The graient ascent iteration is efine as follows x t+1 = x t + ɛi n x f(x t ) (99) where ɛ is a small positive constant Thus graient escent can be seen as a Newton-Raphson metho in which the Hessian matrix is approximate by 1 ɛ I n 122 Gauss-Newton Metho Let f(x) = n i=1 r i(x) 2 for r i : R n R We start the process with an arbitrary point x 0 R n Let x t R n represent the state of the algorithm at iteration t We approximate the functions r i using the linear term of their Taylor expansion aroun x t ˆr i (x t ) = r i (x t ) + ( x r i (x t )) T (x x t ) (100) n n ˆf t (x) = (ˆr i (x t )) 2 = (r i (x t ) ( x r i (x t )) T x t + ( x r i (x t )) T x) 2 (101) i=1 i=1 (102) Minimizing ˆf t (x) is a linear least squares problem of well known solution If we let y i = ( x r i (x t )) T x t r i (x t ) an u i = x r i (x t ) then n n x t+1 = ( u i u T i ) 1 ( u i y i ) (103) i=1 Note this is equivalent to Newton-Raphson with the Hessian being approximate by ( n i=1 u iu T i ) 1 i=1 13

History 1 The first version of this ocument was written by Javier R Movellan, on May 2004, base on an Appenix from the Multivariate Logistic Regression Primer at the Kolmogorov Project The original ocument was 7 pages long 14