Derivatives of Functions from R n to R m Marc H Mehlman Department of Mathematics University of New Haven 20 October 2014 1 Matrices Definition 11 A m n matrix, A, is an array of numbers having m rows and n columns The element of A in the i th row and j th column is denoted by A ij Scalar multiplication of a matrix is accomplished componentwise, ie (ka) ij = ka ij Matrix addition (or subtraction) of two matrices of the same dimensions is also defined componentwise, ie (A + B) ij = A ij + B ij Matrix addition is not defined between matrices of different dimensions Matrix multiplication between an m n matrix, A, and an n p matrix, B, produces an m p matrix, AB The element of AB in the i th row and j th column is obtained by taking the dot product of the i th row of A and the j th column of B, ie, [AB] ij = n A ik B kj = (i th row of A) (j th column of B) k=1 Example 12 [ 1 2 3 1 0 2 ] 1 2 1 1 1 2 2 1 3 = [ 5 7 14 3 0 5 ] 1
While matrix multiplication need not be commutative, it is associative and it does obey the distributive laws Convention 13 In the following discussions, vectors x R n will be considered as n 1 matrices, ie, column vectors Definition 14 A function f : R n R m is a linear transformation (sometimes referred to as just linear) iff for every x, y R n and c R, 1 f( x + y) = f( x) + f( y) 2 f(c x) = cf( x) The following proposition is stated without proof, thought the proof is not hard Proposition 15 Let f : R n R m be a linear transformation Then 1 f( ) is continuous and there exists N > 0 (depending on f) such that f( x) N x for all x R n 2 Matrix multiplication is linear, ie, if A is an m n matrix then g( x) def = A x is a linear transformation 3 There exists an m n matrix A such that f( x) = A x 2
2 Derivatives Assume m = n = 1, that is f : R R as in Calc I Then f f(x) f(x 0 ) (x 0 ) = f f(x) f(x 0 ) (x 0 ) = 0 x x0 x x 0 x x0 x x [ 0 ] x x0 x x0 f (x 0 ) f(x) f(x 0) = 0 x x [ 0 f (x 0 )(x x 0 ) f(x) f(x 0) x x 0 x x 0 ] = 0 f (x 0 )(x x 0 ) (f(x) f(x 0 ) = 0 x x0 x x 0 f (x 0 )(x x 0 ) (f(x) f(x 0 ) x x0 x x 0 = 0 This suggests: Definition 21 Let f : R n R m The Fréchet Derivative (Jacobian matrix or just Derivative) at x 0 is the m n matrix, Df (if it exists), that satisfies, Df( x x 0 ) (f( x) f) x x 0 = 0 (1) Actually the above definition is potentially flawed since in saying the m n matrix, Df (if it exists) it was implied that there can be at most one m n matrix that satisfies (1) Thus one needs the following theorem which states that a solution to (1) is unique Theorem 22 A solution, Df, to (1) is unique 3
Proof If A is a m n matrix that also satisfies (1) then, 0 x x0 Df( x x 0 ) A( x x 0 ) [Df( x x 0 ) (f( x) f)] [A( x x 0 ) (f( x) f)] = x x0 Df( x x 0 ) (f( x) f) A( x x 0 ) (f( x) f) + x x0 x x0 = 0 Thus for any y R n one observes that ( x 0 +t y) x 0 = t y and t 0 x 0 +t y x 0 Thus 0 = x x0 Df( x x 0 ) A( x x 0 ) Df(t y) A(t y) = t 0 t y Df y A y = t 0 y = Df( x 0) y A y y This implies that Df y = A y for all y R n which in turns implies that Df = A, which is what we wanted to show Example 23 If m = n = 1, that is f : R R, then Df(x 0 ) is just f (x 0 ) (the good old Calculus I derivative) since Df(x 0 )(x x 0 ) (f(x) f(x 0 )) = x x 0 x x 0 x x0 Df(x 0) f(x) f(x 0) x x 0 = 0 which implies that Df(x 0 ) = x x0 f(x) f(x 0 ) x x 0 = f (x 0 ) Lemma 24 Let f : R n R m be differentiable at x 0 Then there exists a function λ : R n R m such that f( x) = f + Df( x x 0 ) + λ( x) (2) where x x0 λ( x) = 0 Proof Define λ( x) def = (f( x) f( x 0)) Df( x x 0 ) 4
Then λ( ) satisfies (2) and x x0 λ( x) = 0 by (1) Since the λ( x) x x 0 is negligible near x 0, local to x 0 the function f can be thought of as a linear transformation plus a transformation If x 0 = 0 and f( 0) = 0 there is no translation and for small x, one sees that f( x) Df( 0) x, ie, locally the function can be approximated with the linear transformation x Df( 0) x Corollary 25 Let f : R n R m be differentiable at x 0 Then there exists an δ > 0 and an M > 0 so that < δ f( x) f( x 0) < M (3) Proof Defining λ( ) as in (2), let δ > 0 be such that < δ 0 λ( x) < 1 Then for < δ ( ) f( x) f x x0 = Df( x) + λ( x) Letting N be as in Proposition 15, ( ) f( x) f x x0 = Df( x) + λ( x) ( ) x x0 Df( x) + λ( x) N x x 0 + δ = N + δ We now let M = N + δ Theorem 26 (Chain Rule) If g : R n R p is differentiable at x 0 R n and f : R p R m is differentiable at g R p then (f g)( ) is differentiable at x 0 and furthermore, D(f g) = Df(g) Dg (4) 5
Proof Look at 0 Df(g( x 0)) Dg( x x 0 ) (f(g( x)) f(g)) = Df(g( x 0)) [Dg( x x 0 ) (g( x) g) + (g( x) g)] (f(g( x)) f(g)) Df(g( x 0)) [g( x) g] (f(g( x)) f(g)) + Df(g( x 0))(Dg( x x 0 )) (g( x) g) We now need to show that the two terms in the last line above go to zero as x x 0 For the first term, by Corollary 25 one can find a δ 0 > 0 and a M > 0 so that < δ 0 g( x) g( x 0) < M Given an ɛ > 0, Df(g)( y g) (f( y) f(g) y g y f(g) implies that there exists δ 1 > 0 so that y g < δ 1 Df(g( x 0))( y g) (f( y) f(g)) y f(g) = 0 < ɛ M Df(g)( y g) (f( y) f(g)) < ɛ M y f(g( x 0)) Since g( ) is differentiable, it is continuous so we can chose δ 2 > 0 so that < δ 2 g( x) g < δ 1 6
Letting δ def = min{δ 0, δ 2 } we can conclude that for < δ, Df(g) [g( x) g] (f(g( x)) f(g)) ɛ [g( x) g( x M 0)] ɛ g( x) g M ɛ M M = ɛ Since ɛ was arbitrary, the first term goes to zero For the second term, let N be as in Proposition 15 with x 0 replaced with g Then Df(g) [Dg( x x 0 ) (g( x) g)] x x 0 N [Dg( x x 0 ) (g( x) g)] x x0 = Dg( x x 0 ) (g( x) g) N = N0 = 0 x x0 Example 27 Assume m = n = p = 1, that is f, g : R R Then (4) and Example 23 gives us the ordinary chain rule, (f g) (x 0 ) = f (g(x 0 ))g (x 0 ) Definition 28 Any function f : R n R m can be written as f( x) = (f 1 ( x),, f m ( x)) where f j : R n R for 1 j m are the coordinate functions of f( ) Theorem 29 Let f : R n R m be a function where f( x) = (f 1 ( x),, f m ( x)) where f i : R n R for 1 i m If all of the partial derivatives, f i x j ( x) exist and are continuous at x 0, then Df exists and furthermore, f 1 f x 1 1 x n Df = (5) f m f x 1 m x n Proof First lets assume that m = 1, ie, f : R n R Let x 0 = x 01,, x 0n R n and let 1 j n Define g : R R n by g(x) = x 01,, x 0(j 1), x, x 0(j+1),, x 0n 7
where g(x) is x 0 except for the j th coordinate which is x Thus f 0 = (f g) (x 0j ) = Df(g(x 0j )) Dg(x 0j ) = Df = [Df] 1j x j 1 0 and we have shown (5) for m = 1 Now suppose that m is any positive integer and f : R n R m Using the case proved above, it suffices to show that f 1 f x 1 1 x n ( x x 0 ) (f( x) f) f m f x 1 m x m x x 0 Df 1 ( x 0 )( x x 0 ) f 1 ( x) f 1 Df m ( x 0 )( x x 0 ) f m ( x) f m = x x0 Df 1 ( x 0 )( x x 0 ) (f 1 ( x) f 1 ) Df m ( x 0 )( x x 0 ) (f m ( x) f m ) = x x0 m j=1 Df j( x 0 )( x x 0 ) (f j ( x) f j ) x x0 m Df j ( x 0 )( x x 0 ) (f j ( x) f j ) = x x 0 j=1 = 0 Example 210 Let f(x, y) = (x 2 + y, x cos(y), yx 2 ) and g(x, y) = (xy, x y 2 ) Then 8
since Df(x, y) = 2x 1 cos(y) x sin(y) 2xy x 2, one has D(f g)(x, y) = Df(g(x, y)) Dg(x, y) = Df(xy, x y 2 ) Dg(x, y) 2xy 1 [ ] = cos(x y 2 ) xy sin(x y 2 ) y x 2xy(x y 2 ) x 2 y 2 1 2y 2xy 2 + 1 2x 2 y 2y = y cos(x y 2 ) xy sin(x y 2 ) x cos(x y 2 ) 2xy 2 sin(x y 2 ) 2xy 2 (x y 2 ) + x 2 y 2 2x 2 y(x y 2 ) 2x 2 y 3 3 Derivatives: Another Viewpoint When considering the function f : R n R m consider a control panel with n levers in front of you These levers control m cranes in the distance Moving the j th lever up can cause more than one crane to move When the levers are in x 0 R n position, f k x j measures the movement in the k th crane when pushing the j th lever When the levers are in x 0 position, the dynamics of the entire apparatus is described by the set of partials { } fk : 1 k m & 1 j n (6) x j The matrix derivative, Df = f 1 x 1 f m x 1 f 1 x n f m x n can be thought of a bookkeeper s way of organizing the partials The m n matrix derivative contains all the information given by (6) In addition, the bookkeepers way of organizing the partials is particularly fortunate in that D(f g) corresponds 9
to Df(g) Dg, ie, the one dimensional chain rule formula works here too, with matrix multiplication taking the place of ordinary multiplication This alternate way of view the Fréchet derivative is accurate as far as it goes, but it is not how most mathematicians picture the Fréchet derivative because this bookkeeper s viewpoint does not encompasses all the important aspects of the Fréchet derivative Lost here is the notion of the Fréchet derivative, Df, being the linear transformation that best approximates f( ) at x 0 Indeed, one needs Definition 21 to obtain Lemma 24 used in the next section 4 Differentials Lemma 24 tells us that when x R n is close to x 0 R n then Df( x x 0 ) (f( x) f) 0 f( x) f + Df( x x 0 ) (7) This is quite good since the error, by Lemma 24 is just λ( x) x x 0, a quantity that goes to zero doubly fast as x x 0 If f : R n R, one is thus led to: Definition 41 Given the points x 0 = x 01,, x 0n R n and x = x 1,, x n R n and the differentiable function f : R n R, the quantity Df( x x 0 ) = f x1 (x 1 x 01 ) + + f xn (x n x 0n ) is called the total differential of f( ) and is written as df def = f x1 dx 1 + + f xn dx n where dx j def = x j x 0j If f : R n R, (7) becomes f( x) f + df One can show that the above equation is just another way of saying that a function can locally be approximated with its tangent plane 10
Equation (7) is very useful as seen in the following example Example 42 Let f(x, y) = x 2 + y 2 Find an approximate answer for f(305, 396) Solution: Let (x 0, y 0 ) = (3, 4) Then f x (x, y) = 1 2 (x2 + y 2 ) 1 2 2x f y (x, y) = 1 2 (x2 + y 2 ) 1 2 2y Thus using (7) f(305, 396) f(3, 4)+ [ f x (3, 4) f y (3, 4) ] [ 305 3 396 4 ] = 5+ [ 3 5 4 5 ] [ 1 20 1 25 ] = 4998 Using a calculator one has f(305, 396) = 49984097, so our approximation is quite good! 5 Specific Instances of Chain Rule The remaining theorems in this section are just, in truth, examples of the multivariate chain rule They are mentioned here as theorems solely to be consistent with our book Theorem 51 Let f(x, y) = z be a differentiable function Let x : R R and y : R R also be differentiable functions Then letting h(t) def = f(x(t), y(t)) one has h (t) = f x (x(t), y(t))x (t) + f y (x(t), y(t))y (t) Proof Let g(t) def = (x(t), y(t)) Then g : R R 2 and h (t) = (f g) (t) = D(f g)(t) = Df(g(t)) Dg(t) = [ f x (x(t), y(t)) f y (x(t), y(t)) ] [ ] x (t) y (t) = f x (x(t), y(t))x (t) + f y (x(t), y(t))y (t) 11
Theorem 52 Let f(x, y, z) = w be a differentiable function Let x, y, z : R R be differentiable functions Then letting h(t) def = f(x(t), y(t), z(t)) one has h (t) = f x (x(t), y(t), z(t))x (t) + f y (x(t), y(t), z(t))y (t) + f z (x(t), y(t), z(t))z (t) Proof See the proof of Theorem 51 Theorem 53 Let f : R 2 R and x, y : R 2 R be differentiable functions Then letting h(s, t) def = f(x(s, t), y(s, t)) one has h s (s, t) = f x (x(s, t), y(s, t))x s (s, t) + f y (x(s, t), y(s, t))y s (s, t) h t (s, t) = f x (x(s, t), y(s, t))x t (s, t) + f y (x(s, t), y(s, t))y t (s, t) Proof Let g(s, t) def = (x(s, t), y(s, t)) Then g : R 2 R 2 and [ hs (x(s, t), y(s, t)) h t (x(s, t), y(s, t)) ] = Dh(s, t) = D(f g)(s, t) = Df(g(s, t)) Dg(s, t) = [ f x (x(s, t), y(s, t)) f y (x(s, t), y(s, t)) ] [ x s (s, t) x t (s, t) y s (s, t) y t (s, t) After computing the above matrix multiplication, the theorem is proved by comparing the components of the first and last matrices above ] 6 Summary Definition 61 Given f : R n R m, the m n matrix, Df, that satisfies Df( x x 0 ) (f( x) f) x x 0 = 0, is called the Fréchet Derivative of f( ) at x 0 Theorem 62 (Chain Rule) If g : R n R p is differentiable at x 0 R n and f : R p R m is differentiable at g R p then (f g)( ) is differentiable at x 0 and furthermore, D(f g) = Df(g) Dg 12
Theorem 63 Let f : R n R m be a function where f( x) = (f 1 ( x),, f m ( x)) where f i : R n R for 1 i m If all of the partial derivatives, f i x j ( x) exist and are continuous at x 0, then Df exists and furthermore, f 1 f x 1 1 x n Df = f m f x 1 m x n From Lemma 24 we have the important fact that f( x) f + Df( x x 0 ) This motivates us to present the following definition Definition 64 Given the points x 0 = x 01,, x 0n R n and x = x 1,, x n R n and the differentiable function f : R n R, the quantity Df( x x 0 ) = f x1 (x 1 x 01 ) + + f xn (x n x 0n ) is called the total differential of f( ) and is written as df def = f x1 dx 1 + + f xn dx n where dx j def = x j x 0j And finally we have the three following special cases of the multivariate chain rule: Theorem 65 Let f(x, y) = z be a differentiable function Let x : R R and y : R R also be differentiable functions Then letting h(t) def = f(x(t), y(t)) one has h (t) = f x (x(t), y(t))x (t) + f y (x(t), y(t))y (t) Theorem 66 Let f(x, y, z) = w be a differentiable function Let x, y, z : R R be differentiable functions Then letting h(t) def = f(x(t), y(t), z(t)) one has h (t) = f x (x(t), y(t), z(t))x (t) + f y (x(t), y(t), z(t))y (t) + f z (x(t), y(t), z(t))z (t) 13
Theorem 67 Let f : R 2 R and x, y : R 2 R be differentiable functions Then letting h(s, t) def = f(x(s, t), y(s, t)) one has h s (s, t) = f x (x(s, t), y(s, t))x s (s, t) + f y (x(s, t), y(s, t))y s (s, t) h t (s, t) = f x (x(s, t), y(s, t))x t (s, t) + f y (x(s, t), y(s, t))y t (s, t) 7 Exercises Let f : R 2 R 3, let g : R 3 R 2, let h : R 2 R and let j : R R 2 be such that f(x, y) def = (x, xy, x sin(y)) g(1, π/3, 3/2) = (π/3, 3/4) [ ] 0 1 0 Dg(x, y, z) = z 2 0 2xz h(x, y) j(x) l(x, y) def = x 2 + y + cx def = (2x 1, 3 cos(x)) def = (h g f)(x, y) 1 Letting k(t) def = h(2t 1, 3 cos(t)), what is k (0)? 2 What is D(h j)(0)? 3 What is Df(1, π/3)? 4 What is D(g f)(1, π/3)? 5 Given that l y (1, π/3) = 6, what is c? Hint: look at Dl(1, π/3) 6 Find the approximate length of the largest stick that will fit in a box of length 201, width 197, and height 102 using differentials Compare this to the actual length 14
References [1] James R Munkres Analysis on Manifolds Addison-Wesley, Redwood City, 1991 [2] Michael Spivak Calculus on Manifolds Benjamin, New York, 1965 15