Derivatives of Functions from R n to R m

Similar documents
g(t) = f(x 1 (t),..., x n (t)).

Sec. 1.1: Basics of Vectors

MATH The Chain Rule Fall 2016 A vector function of a vector variable is a function F: R n R m. In practice, if x 1, x n is the input,

Note: Every graph is a level set (why?). But not every level set is a graph. Graphs must pass the vertical line test. (Level sets may or may not.

NOTES ON MULTIVARIABLE CALCULUS: DIFFERENTIAL CALCULUS

25. Chain Rule. Now, f is a function of t only. Expand by multiplication:

THE INVERSE FUNCTION THEOREM

The Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR

February 11, 2019 LECTURE 5: THE RULES OF DIFFERENTIATION.

Contents. 2 Partial Derivatives. 2.1 Limits and Continuity. Calculus III (part 2): Partial Derivatives (by Evan Dummit, 2017, v. 2.

g(2, 1) = cos(2π) + 1 = = 9

Review of Multi-Calculus (Study Guide for Spivak s CHAPTER ONE TO THREE)

REVIEW OF DIFFERENTIAL CALCULUS

Definition 5.1. A vector field v on a manifold M is map M T M such that for all x M, v(x) T x M.

Multi Variable Calculus

THEODORE VORONOV DIFFERENTIABLE MANIFOLDS. Fall Last updated: November 26, (Under construction.)

Section x7 +

Chain Rule. MATH 311, Calculus III. J. Robert Buchanan. Spring Department of Mathematics

MA2AA1 (ODE s): The inverse and implicit function theorem

The theory of manifolds Lecture 2

M311 Functions of Several Variables. CHAPTER 1. Continuity CHAPTER 2. The Bolzano Weierstrass Theorem and Compact Sets CHAPTER 3.

Math 212-Lecture 8. The chain rule with one independent variable

The Inverse Function Theorem 1

MATH 31BH Homework 5 Solutions

Differentiation. f(x + h) f(x) Lh = L.

1 Functions of Several Variables Some Examples Level Curves / Contours Functions of More Variables... 6

Mathematical Economics: Lecture 9

d(x n, x) d(x n, x nk ) + d(x nk, x) where we chose any fixed k > N

Iowa State University. Instructor: Alex Roitershtein Summer Homework #5. Solutions

z x = f x (x, y, a, b), z y = f y (x, y, a, b). F(x, y, z, z x, z y ) = 0. This is a PDE for the unknown function of two independent variables.

Section 3.5 The Implicit Function Theorem

Week 4: Differentiation for Functions of Several Variables

Several variables. x 1 x 2. x n

How to Use Calculus Like a Physicist

x 1. x n i + x 2 j (x 1, x 2, x 3 ) = x 1 j + x 3

Differentiation in higher dimensions

Lecture Notes on Metric Spaces

4 Partial Differentiation

SMSTC (2017/18) Geometry and Topology 2.

MAT 473 Intermediate Real Analysis II

f(x) f(z) c x z > 0 1

Partial Derivatives for Math 229. Our puropose here is to explain how one computes partial derivatives. We will not attempt

Implicit Functions, Curves and Surfaces

Some suggested repetition for the course MAA508

Lecture 13 - Wednesday April 29th

Math 361: Homework 1 Solutions

3.1. Derivations. Let A be a commutative k-algebra. Let M be a left A-module. A derivation of A in M is a linear map D : A M such that

Math 480 The Vector Space of Differentiable Functions

INTRODUCTION TO ALGEBRAIC GEOMETRY

Parametric Equations, Function Composition and the Chain Rule: A Worksheet

ODEs Cathal Ormond 1

MATH 251 Final Examination May 3, 2017 FORM A. Name: Student Number: Section:

VANDERBILT UNIVERSITY. MATH 2300 MULTIVARIABLE CALCULUS Practice Test 1 Solutions

Multiple Choice Answers. MA 114 Calculus II Spring 2013 Final Exam 1 May Question

X. Linearization and Newton s Method

BROUWER FIXED POINT THEOREM. Contents 1. Introduction 1 2. Preliminaries 1 3. Brouwer fixed point theorem 3 Acknowledgments 8 References 8

LECTURE 5: SMOOTH MAPS. 1. Smooth Maps

Chapter 8: Taylor s theorem and L Hospital s rule

Lecture notes for Math Multivariable Calculus

Math 117: Continuity of Functions

MATHEMATICAL ECONOMICS: OPTIMIZATION. Contents

Sometimes the domains X and Z will be the same, so this might be written:

Section 3.7. Rolle s Theorem and the Mean Value Theorem

Math 321 Final Exam. 1. Do NOT write your answers on these sheets. Nothing written on the test papers will be graded.

SARD S THEOREM ALEX WRIGHT

M4P52 Manifolds, 2016 Problem Sheet 1

February 4, 2019 LECTURE 4: THE DERIVATIVE.

n f(k) k=1 means to evaluate the function f(k) at k = 1, 2,..., n and add up the results. In other words: n f(k) = f(1) + f(2) f(n). 1 = 2n 2.

Definition 3 (Continuity). A function f is continuous at c if lim x c f(x) = f(c).

Chapter 3a Topics in differentiation. Problems in differentiation. Problems in differentiation. LC Abueg: mathematical economics

MATH 423/ Note that the algebraic operations on the right hand side are vector subtraction and scalar multiplication.

21-256: Partial differentiation

1. Nonlinear Equations. This lecture note excerpted parts from Michael Heath and Max Gunzburger. f(x) = 0

DRAFT - Math 101 Lecture Note - Dr. Said Algarni

Lecture 5: Functions : Images, Compositions, Inverses

f(s)ds, i.e. to write down the general solution in

Optimization and Calculus

MATH 105: PRACTICE PROBLEMS FOR CHAPTER 3: SPRING 2010

THE WAVE EQUATION. F = T (x, t) j + T (x + x, t) j = T (sin(θ(x, t)) + sin(θ(x + x, t)))

February 27, 2019 LECTURE 10: VECTOR FIELDS.

Chapter 1 of Calculus ++ : Differential calculus with several variables

Exercises for Multivariable Differential Calculus XM521

Chapter 4. The First Fundamental Form (Induced Metric)

n=0 ( 1)n /(n + 1) converges, but not

Course Summary Math 211

Multiple Choice Answers. MA 113 Calculus I Spring 2018 Exam 2 Tuesday, 6 March Question

10. Smooth Varieties. 82 Andreas Gathmann

Sample Questions Exam II, FS2009 Paulette Saab Calculators are neither needed nor allowed.

TANGENT VECTORS. THREE OR FOUR DEFINITIONS.

MAT137 Calculus! Lecture 6

The Heine-Borel and Arzela-Ascoli Theorems

JUST THE MATHS UNIT NUMBER PARTIAL DIFFERENTIATION 1 (Partial derivatives of the first order) A.J.Hobson

Math 111: Calculus. David Perkinson

Vector Calculus lecture notes

3 Applications of partial differentiation

1.1 Single Variable Calculus versus Multivariable Calculus Rectangular Coordinate Systems... 4

ALGEBRAIC GROUPS: PART III

2. Lie groups as manifolds. SU(2) and the three-sphere. * version 1.4 *

Selected Solutions. where cl(ω) are the cluster points of Ω and Ω are the set of boundary points.

Metric Spaces and Topology

Transcription:

Derivatives of Functions from R n to R m Marc H Mehlman Department of Mathematics University of New Haven 20 October 2014 1 Matrices Definition 11 A m n matrix, A, is an array of numbers having m rows and n columns The element of A in the i th row and j th column is denoted by A ij Scalar multiplication of a matrix is accomplished componentwise, ie (ka) ij = ka ij Matrix addition (or subtraction) of two matrices of the same dimensions is also defined componentwise, ie (A + B) ij = A ij + B ij Matrix addition is not defined between matrices of different dimensions Matrix multiplication between an m n matrix, A, and an n p matrix, B, produces an m p matrix, AB The element of AB in the i th row and j th column is obtained by taking the dot product of the i th row of A and the j th column of B, ie, [AB] ij = n A ik B kj = (i th row of A) (j th column of B) k=1 Example 12 [ 1 2 3 1 0 2 ] 1 2 1 1 1 2 2 1 3 = [ 5 7 14 3 0 5 ] 1

While matrix multiplication need not be commutative, it is associative and it does obey the distributive laws Convention 13 In the following discussions, vectors x R n will be considered as n 1 matrices, ie, column vectors Definition 14 A function f : R n R m is a linear transformation (sometimes referred to as just linear) iff for every x, y R n and c R, 1 f( x + y) = f( x) + f( y) 2 f(c x) = cf( x) The following proposition is stated without proof, thought the proof is not hard Proposition 15 Let f : R n R m be a linear transformation Then 1 f( ) is continuous and there exists N > 0 (depending on f) such that f( x) N x for all x R n 2 Matrix multiplication is linear, ie, if A is an m n matrix then g( x) def = A x is a linear transformation 3 There exists an m n matrix A such that f( x) = A x 2

2 Derivatives Assume m = n = 1, that is f : R R as in Calc I Then f f(x) f(x 0 ) (x 0 ) = f f(x) f(x 0 ) (x 0 ) = 0 x x0 x x 0 x x0 x x [ 0 ] x x0 x x0 f (x 0 ) f(x) f(x 0) = 0 x x [ 0 f (x 0 )(x x 0 ) f(x) f(x 0) x x 0 x x 0 ] = 0 f (x 0 )(x x 0 ) (f(x) f(x 0 ) = 0 x x0 x x 0 f (x 0 )(x x 0 ) (f(x) f(x 0 ) x x0 x x 0 = 0 This suggests: Definition 21 Let f : R n R m The Fréchet Derivative (Jacobian matrix or just Derivative) at x 0 is the m n matrix, Df (if it exists), that satisfies, Df( x x 0 ) (f( x) f) x x 0 = 0 (1) Actually the above definition is potentially flawed since in saying the m n matrix, Df (if it exists) it was implied that there can be at most one m n matrix that satisfies (1) Thus one needs the following theorem which states that a solution to (1) is unique Theorem 22 A solution, Df, to (1) is unique 3

Proof If A is a m n matrix that also satisfies (1) then, 0 x x0 Df( x x 0 ) A( x x 0 ) [Df( x x 0 ) (f( x) f)] [A( x x 0 ) (f( x) f)] = x x0 Df( x x 0 ) (f( x) f) A( x x 0 ) (f( x) f) + x x0 x x0 = 0 Thus for any y R n one observes that ( x 0 +t y) x 0 = t y and t 0 x 0 +t y x 0 Thus 0 = x x0 Df( x x 0 ) A( x x 0 ) Df(t y) A(t y) = t 0 t y Df y A y = t 0 y = Df( x 0) y A y y This implies that Df y = A y for all y R n which in turns implies that Df = A, which is what we wanted to show Example 23 If m = n = 1, that is f : R R, then Df(x 0 ) is just f (x 0 ) (the good old Calculus I derivative) since Df(x 0 )(x x 0 ) (f(x) f(x 0 )) = x x 0 x x 0 x x0 Df(x 0) f(x) f(x 0) x x 0 = 0 which implies that Df(x 0 ) = x x0 f(x) f(x 0 ) x x 0 = f (x 0 ) Lemma 24 Let f : R n R m be differentiable at x 0 Then there exists a function λ : R n R m such that f( x) = f + Df( x x 0 ) + λ( x) (2) where x x0 λ( x) = 0 Proof Define λ( x) def = (f( x) f( x 0)) Df( x x 0 ) 4

Then λ( ) satisfies (2) and x x0 λ( x) = 0 by (1) Since the λ( x) x x 0 is negligible near x 0, local to x 0 the function f can be thought of as a linear transformation plus a transformation If x 0 = 0 and f( 0) = 0 there is no translation and for small x, one sees that f( x) Df( 0) x, ie, locally the function can be approximated with the linear transformation x Df( 0) x Corollary 25 Let f : R n R m be differentiable at x 0 Then there exists an δ > 0 and an M > 0 so that < δ f( x) f( x 0) < M (3) Proof Defining λ( ) as in (2), let δ > 0 be such that < δ 0 λ( x) < 1 Then for < δ ( ) f( x) f x x0 = Df( x) + λ( x) Letting N be as in Proposition 15, ( ) f( x) f x x0 = Df( x) + λ( x) ( ) x x0 Df( x) + λ( x) N x x 0 + δ = N + δ We now let M = N + δ Theorem 26 (Chain Rule) If g : R n R p is differentiable at x 0 R n and f : R p R m is differentiable at g R p then (f g)( ) is differentiable at x 0 and furthermore, D(f g) = Df(g) Dg (4) 5

Proof Look at 0 Df(g( x 0)) Dg( x x 0 ) (f(g( x)) f(g)) = Df(g( x 0)) [Dg( x x 0 ) (g( x) g) + (g( x) g)] (f(g( x)) f(g)) Df(g( x 0)) [g( x) g] (f(g( x)) f(g)) + Df(g( x 0))(Dg( x x 0 )) (g( x) g) We now need to show that the two terms in the last line above go to zero as x x 0 For the first term, by Corollary 25 one can find a δ 0 > 0 and a M > 0 so that < δ 0 g( x) g( x 0) < M Given an ɛ > 0, Df(g)( y g) (f( y) f(g) y g y f(g) implies that there exists δ 1 > 0 so that y g < δ 1 Df(g( x 0))( y g) (f( y) f(g)) y f(g) = 0 < ɛ M Df(g)( y g) (f( y) f(g)) < ɛ M y f(g( x 0)) Since g( ) is differentiable, it is continuous so we can chose δ 2 > 0 so that < δ 2 g( x) g < δ 1 6

Letting δ def = min{δ 0, δ 2 } we can conclude that for < δ, Df(g) [g( x) g] (f(g( x)) f(g)) ɛ [g( x) g( x M 0)] ɛ g( x) g M ɛ M M = ɛ Since ɛ was arbitrary, the first term goes to zero For the second term, let N be as in Proposition 15 with x 0 replaced with g Then Df(g) [Dg( x x 0 ) (g( x) g)] x x 0 N [Dg( x x 0 ) (g( x) g)] x x0 = Dg( x x 0 ) (g( x) g) N = N0 = 0 x x0 Example 27 Assume m = n = p = 1, that is f, g : R R Then (4) and Example 23 gives us the ordinary chain rule, (f g) (x 0 ) = f (g(x 0 ))g (x 0 ) Definition 28 Any function f : R n R m can be written as f( x) = (f 1 ( x),, f m ( x)) where f j : R n R for 1 j m are the coordinate functions of f( ) Theorem 29 Let f : R n R m be a function where f( x) = (f 1 ( x),, f m ( x)) where f i : R n R for 1 i m If all of the partial derivatives, f i x j ( x) exist and are continuous at x 0, then Df exists and furthermore, f 1 f x 1 1 x n Df = (5) f m f x 1 m x n Proof First lets assume that m = 1, ie, f : R n R Let x 0 = x 01,, x 0n R n and let 1 j n Define g : R R n by g(x) = x 01,, x 0(j 1), x, x 0(j+1),, x 0n 7

where g(x) is x 0 except for the j th coordinate which is x Thus f 0 = (f g) (x 0j ) = Df(g(x 0j )) Dg(x 0j ) = Df = [Df] 1j x j 1 0 and we have shown (5) for m = 1 Now suppose that m is any positive integer and f : R n R m Using the case proved above, it suffices to show that f 1 f x 1 1 x n ( x x 0 ) (f( x) f) f m f x 1 m x m x x 0 Df 1 ( x 0 )( x x 0 ) f 1 ( x) f 1 Df m ( x 0 )( x x 0 ) f m ( x) f m = x x0 Df 1 ( x 0 )( x x 0 ) (f 1 ( x) f 1 ) Df m ( x 0 )( x x 0 ) (f m ( x) f m ) = x x0 m j=1 Df j( x 0 )( x x 0 ) (f j ( x) f j ) x x0 m Df j ( x 0 )( x x 0 ) (f j ( x) f j ) = x x 0 j=1 = 0 Example 210 Let f(x, y) = (x 2 + y, x cos(y), yx 2 ) and g(x, y) = (xy, x y 2 ) Then 8

since Df(x, y) = 2x 1 cos(y) x sin(y) 2xy x 2, one has D(f g)(x, y) = Df(g(x, y)) Dg(x, y) = Df(xy, x y 2 ) Dg(x, y) 2xy 1 [ ] = cos(x y 2 ) xy sin(x y 2 ) y x 2xy(x y 2 ) x 2 y 2 1 2y 2xy 2 + 1 2x 2 y 2y = y cos(x y 2 ) xy sin(x y 2 ) x cos(x y 2 ) 2xy 2 sin(x y 2 ) 2xy 2 (x y 2 ) + x 2 y 2 2x 2 y(x y 2 ) 2x 2 y 3 3 Derivatives: Another Viewpoint When considering the function f : R n R m consider a control panel with n levers in front of you These levers control m cranes in the distance Moving the j th lever up can cause more than one crane to move When the levers are in x 0 R n position, f k x j measures the movement in the k th crane when pushing the j th lever When the levers are in x 0 position, the dynamics of the entire apparatus is described by the set of partials { } fk : 1 k m & 1 j n (6) x j The matrix derivative, Df = f 1 x 1 f m x 1 f 1 x n f m x n can be thought of a bookkeeper s way of organizing the partials The m n matrix derivative contains all the information given by (6) In addition, the bookkeepers way of organizing the partials is particularly fortunate in that D(f g) corresponds 9

to Df(g) Dg, ie, the one dimensional chain rule formula works here too, with matrix multiplication taking the place of ordinary multiplication This alternate way of view the Fréchet derivative is accurate as far as it goes, but it is not how most mathematicians picture the Fréchet derivative because this bookkeeper s viewpoint does not encompasses all the important aspects of the Fréchet derivative Lost here is the notion of the Fréchet derivative, Df, being the linear transformation that best approximates f( ) at x 0 Indeed, one needs Definition 21 to obtain Lemma 24 used in the next section 4 Differentials Lemma 24 tells us that when x R n is close to x 0 R n then Df( x x 0 ) (f( x) f) 0 f( x) f + Df( x x 0 ) (7) This is quite good since the error, by Lemma 24 is just λ( x) x x 0, a quantity that goes to zero doubly fast as x x 0 If f : R n R, one is thus led to: Definition 41 Given the points x 0 = x 01,, x 0n R n and x = x 1,, x n R n and the differentiable function f : R n R, the quantity Df( x x 0 ) = f x1 (x 1 x 01 ) + + f xn (x n x 0n ) is called the total differential of f( ) and is written as df def = f x1 dx 1 + + f xn dx n where dx j def = x j x 0j If f : R n R, (7) becomes f( x) f + df One can show that the above equation is just another way of saying that a function can locally be approximated with its tangent plane 10

Equation (7) is very useful as seen in the following example Example 42 Let f(x, y) = x 2 + y 2 Find an approximate answer for f(305, 396) Solution: Let (x 0, y 0 ) = (3, 4) Then f x (x, y) = 1 2 (x2 + y 2 ) 1 2 2x f y (x, y) = 1 2 (x2 + y 2 ) 1 2 2y Thus using (7) f(305, 396) f(3, 4)+ [ f x (3, 4) f y (3, 4) ] [ 305 3 396 4 ] = 5+ [ 3 5 4 5 ] [ 1 20 1 25 ] = 4998 Using a calculator one has f(305, 396) = 49984097, so our approximation is quite good! 5 Specific Instances of Chain Rule The remaining theorems in this section are just, in truth, examples of the multivariate chain rule They are mentioned here as theorems solely to be consistent with our book Theorem 51 Let f(x, y) = z be a differentiable function Let x : R R and y : R R also be differentiable functions Then letting h(t) def = f(x(t), y(t)) one has h (t) = f x (x(t), y(t))x (t) + f y (x(t), y(t))y (t) Proof Let g(t) def = (x(t), y(t)) Then g : R R 2 and h (t) = (f g) (t) = D(f g)(t) = Df(g(t)) Dg(t) = [ f x (x(t), y(t)) f y (x(t), y(t)) ] [ ] x (t) y (t) = f x (x(t), y(t))x (t) + f y (x(t), y(t))y (t) 11

Theorem 52 Let f(x, y, z) = w be a differentiable function Let x, y, z : R R be differentiable functions Then letting h(t) def = f(x(t), y(t), z(t)) one has h (t) = f x (x(t), y(t), z(t))x (t) + f y (x(t), y(t), z(t))y (t) + f z (x(t), y(t), z(t))z (t) Proof See the proof of Theorem 51 Theorem 53 Let f : R 2 R and x, y : R 2 R be differentiable functions Then letting h(s, t) def = f(x(s, t), y(s, t)) one has h s (s, t) = f x (x(s, t), y(s, t))x s (s, t) + f y (x(s, t), y(s, t))y s (s, t) h t (s, t) = f x (x(s, t), y(s, t))x t (s, t) + f y (x(s, t), y(s, t))y t (s, t) Proof Let g(s, t) def = (x(s, t), y(s, t)) Then g : R 2 R 2 and [ hs (x(s, t), y(s, t)) h t (x(s, t), y(s, t)) ] = Dh(s, t) = D(f g)(s, t) = Df(g(s, t)) Dg(s, t) = [ f x (x(s, t), y(s, t)) f y (x(s, t), y(s, t)) ] [ x s (s, t) x t (s, t) y s (s, t) y t (s, t) After computing the above matrix multiplication, the theorem is proved by comparing the components of the first and last matrices above ] 6 Summary Definition 61 Given f : R n R m, the m n matrix, Df, that satisfies Df( x x 0 ) (f( x) f) x x 0 = 0, is called the Fréchet Derivative of f( ) at x 0 Theorem 62 (Chain Rule) If g : R n R p is differentiable at x 0 R n and f : R p R m is differentiable at g R p then (f g)( ) is differentiable at x 0 and furthermore, D(f g) = Df(g) Dg 12

Theorem 63 Let f : R n R m be a function where f( x) = (f 1 ( x),, f m ( x)) where f i : R n R for 1 i m If all of the partial derivatives, f i x j ( x) exist and are continuous at x 0, then Df exists and furthermore, f 1 f x 1 1 x n Df = f m f x 1 m x n From Lemma 24 we have the important fact that f( x) f + Df( x x 0 ) This motivates us to present the following definition Definition 64 Given the points x 0 = x 01,, x 0n R n and x = x 1,, x n R n and the differentiable function f : R n R, the quantity Df( x x 0 ) = f x1 (x 1 x 01 ) + + f xn (x n x 0n ) is called the total differential of f( ) and is written as df def = f x1 dx 1 + + f xn dx n where dx j def = x j x 0j And finally we have the three following special cases of the multivariate chain rule: Theorem 65 Let f(x, y) = z be a differentiable function Let x : R R and y : R R also be differentiable functions Then letting h(t) def = f(x(t), y(t)) one has h (t) = f x (x(t), y(t))x (t) + f y (x(t), y(t))y (t) Theorem 66 Let f(x, y, z) = w be a differentiable function Let x, y, z : R R be differentiable functions Then letting h(t) def = f(x(t), y(t), z(t)) one has h (t) = f x (x(t), y(t), z(t))x (t) + f y (x(t), y(t), z(t))y (t) + f z (x(t), y(t), z(t))z (t) 13

Theorem 67 Let f : R 2 R and x, y : R 2 R be differentiable functions Then letting h(s, t) def = f(x(s, t), y(s, t)) one has h s (s, t) = f x (x(s, t), y(s, t))x s (s, t) + f y (x(s, t), y(s, t))y s (s, t) h t (s, t) = f x (x(s, t), y(s, t))x t (s, t) + f y (x(s, t), y(s, t))y t (s, t) 7 Exercises Let f : R 2 R 3, let g : R 3 R 2, let h : R 2 R and let j : R R 2 be such that f(x, y) def = (x, xy, x sin(y)) g(1, π/3, 3/2) = (π/3, 3/4) [ ] 0 1 0 Dg(x, y, z) = z 2 0 2xz h(x, y) j(x) l(x, y) def = x 2 + y + cx def = (2x 1, 3 cos(x)) def = (h g f)(x, y) 1 Letting k(t) def = h(2t 1, 3 cos(t)), what is k (0)? 2 What is D(h j)(0)? 3 What is Df(1, π/3)? 4 What is D(g f)(1, π/3)? 5 Given that l y (1, π/3) = 6, what is c? Hint: look at Dl(1, π/3) 6 Find the approximate length of the largest stick that will fit in a box of length 201, width 197, and height 102 using differentials Compare this to the actual length 14

References [1] James R Munkres Analysis on Manifolds Addison-Wesley, Redwood City, 1991 [2] Michael Spivak Calculus on Manifolds Benjamin, New York, 1965 15