Differentiation of Multivariable Functions

Similar documents
4 Partial Differentiation

Week 4: Differentiation for Functions of Several Variables

Slope Fields: Graphing Solutions Without the Solutions

Definition 3 (Continuity). A function f is continuous at c if lim x c f(x) = f(c).

Major Ideas in Calc 3 / Exam Review Topics

Section 14.1 Vector Functions and Space Curves

LECTURE 11 - PARTIAL DIFFERENTIATION

Functions of Several Variables

DIFFERENTIAL EQUATIONS

Chapter 10. Definition of the Derivative Velocity and Tangents

3 Applications of partial differentiation

REVIEW OF DIFFERENTIAL CALCULUS

3.5 Quadratic Approximation and Convexity/Concavity

Study Guide/Practice Exam 2 Solution. This study guide/practice exam is longer and harder than the actual exam. Problem A: Power Series. x 2i /i!

Lecture 10. (2) Functions of two variables. Partial derivatives. Dan Nichols February 27, 2018

Solutions to old Exam 3 problems

MA 123 September 8, 2016

Midterm 1 Review. Distance = (x 1 x 0 ) 2 + (y 1 y 0 ) 2.

Review for the First Midterm Exam

MATH The Chain Rule Fall 2016 A vector function of a vector variable is a function F: R n R m. In practice, if x 1, x n is the input,

Everything Old Is New Again: Connecting Calculus To Algebra Andrew Freda

ENGI Partial Differentiation Page y f x

Chapter 2 Derivatives

1 Functions of Several Variables Some Examples Level Curves / Contours Functions of More Variables... 6

n=0 ( 1)n /(n + 1) converges, but not

1 Differentiability at a point

24. Partial Differentiation

2.3 Composite Functions. We have seen that the union of sets A and B is defined by:

Contents. 2 Partial Derivatives. 2.1 Limits and Continuity. Calculus III (part 2): Partial Derivatives (by Evan Dummit, 2017, v. 2.

September Math Course: First Order Derivative

Exam 1 Review SOLUTIONS

Formulas that must be memorized:

f(x 0 + h) f(x 0 ) h slope of secant line = m sec

Main topics for the First Midterm Exam

Coordinate systems and vectors in three spatial dimensions

The Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR

1 Functions of many variables.

MATH 200 WEEK 5 - WEDNESDAY DIRECTIONAL DERIVATIVE

23. Implicit differentiation

Limit. Chapter Introduction

Faculty of Engineering, Mathematics and Science School of Mathematics

1.4 Techniques of Integration

MATH 19520/51 Class 5

CALCULUS III THE CHAIN RULE, DIRECTIONAL DERIVATIVES, AND GRADIENT

Page Points Score Total: 210. No more than 200 points may be earned on the exam.

Derivatives and Integrals

x + ye z2 + ze y2, y + xe z2 + ze x2, z and where T is the

Joint Probability Distributions and Random Samples (Devore Chapter Five)

MAC 2311 Calculus I Spring 2004

Partial Derivatives Formulas. KristaKingMath.com

Directional Derivatives and Gradient Vectors. Suppose we want to find the rate of change of a function z = f x, y at the point in the

This exam will be over material covered in class from Monday 14 February through Tuesday 8 March, corresponding to sections in the text.

Vectors, dot product, and cross product

DRAFT - Math 101 Lecture Note - Dr. Said Algarni

MSM120 1M1 First year mathematics for civil engineers Revision notes 4

3 Algebraic Methods. we can differentiate both sides implicitly to obtain a differential equation involving x and y:

Lecture 13 - Wednesday April 29th

Learning Objectives for Math 165

APPLICATIONS OF DIFFERENTIATION

(x + 3)(x 1) lim(x + 3) = 4. lim. (x 2)( x ) = (x 2)(x + 2) x + 2 x = 4. dt (t2 + 1) = 1 2 (t2 + 1) 1 t. f(x) = lim 3x = 6,

ter. on Can we get a still better result? Yes, by making the rectangles still smaller. As we make the rectangles smaller and smaller, the

Math 261 Calculus I. Test 1 Study Guide. Name. Decide whether the limit exists. If it exists, find its value. 1) lim x 1. f(x) 2) lim x -1/2 f(x)

Unit IV Derivatives 20 Hours Finish by Christmas

Unit IV Derivatives 20 Hours Finish by Christmas

Math Review ECON 300: Spring 2014 Benjamin A. Jones MATH/CALCULUS REVIEW

Multivariable Calculus Notes. Faraad Armwood. Fall: Chapter 1: Vectors, Dot Product, Cross Product, Planes, Cylindrical & Spherical Coordinates

Math 112 Group Activity: The Vertical Speed of a Shell

Section 1.4 Tangents and Velocity

b) The system of ODE s d x = v(x) in U. (2) dt

Math 2 Variable Manipulation Part 7 Absolute Value & Inequalities

Math 20C Homework 2 Partial Solutions

Math 300 Introduction to Mathematical Reasoning Autumn 2017 Inverse Functions

Preface. Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Test 3 Review. y f(a) = f (a)(x a) y = f (a)(x a) + f(a) L(x) = f (a)(x a) + f(a)

AP Calculus. Derivatives.

Notes on multivariable calculus

Engg. Math. I. Unit-I. Differential Calculus

2.1 The Tangent and Velocity Problems

Intro Vectors 2D implicit curves 2D parametric curves. Graphics 2012/2013, 4th quarter. Lecture 2: vectors, curves, and surfaces

Vector Calculus, Maths II

Parametric Equations, Function Composition and the Chain Rule: A Worksheet

No calculators, cell phones or any other electronic devices can be used on this exam. Clear your desk of everything excepts pens, pencils and erasers.

MATH Max-min Theory Fall 2016

February 27, 2019 LECTURE 10: VECTOR FIELDS.

Vector Functions & Space Curves MATH 2110Q

Math Example Set 12A. m10360/homework.html

M311 Functions of Several Variables. CHAPTER 1. Continuity CHAPTER 2. The Bolzano Weierstrass Theorem and Compact Sets CHAPTER 3.

Exercises for Multivariable Differential Calculus XM521

Functions. If x 2 D, then g(x) 2 T is the object that g assigns to x. Writing the symbols. g : D! T

Directional Derivatives in the Plane

Analysis II - few selective results

MITOCW MITRES_18-007_Part3_lec5_300k.mp4

MATH 114 Calculus Notes on Chapter 2 (Limits) (pages 60-? in Stewart)

Calculus with Analytic Geometry I Exam 8 Take Home Part.

True or False. Circle T if the statement is always true; otherwise circle F. for all angles θ. T F. 1 sin θ

Bonus Section II: Solving Trigonometric Equations

Math 21a Partial Derivatives Fall, 2016

MATH 18.01, FALL PROBLEM SET # 6 SOLUTIONS

2.1 How Do We Measure Speed? Student Notes HH6ed. Time (sec) Position (m)

Chapter 0 of Calculus ++, Differential calculus with several variables

Transcription:

Differentiation of Multivariable Functions 1 Introduction Beginning calculus students identify the derivative of a function either in terms of slope or instantaneous rate of change. When thinking of the former they say something like, f (a) is the slope of the line that is tangent to the graph of the function f at the point ( a, f(a) ). Can this idea be extended to functions f : R n R? For example, Figure 1 shows the graph of f(x, y) = cos x sin y + 2 for (x, y) [ π, π] [ π, π]. The point ( 1, 2, f(1, 2) ) (1, 2, 2.5) is represented by the dark dot on the graph. What would it mean to talk about a slope at that point? Figure 1: What is the slope at ( 1, 2, f(1, 2) ) (1, 2, 2.5)? Answering this question might seem like a tough row to hoe. To make some headway we will revisit the derivative of elementary calculus and see if we can formulate it in a way that might apply to multivariable functions as well. We begin by looking at functions that are not differentiable. Consider the function f(x) = (x 1) 2 3 + 2, where 5 x 5. Figure 2 shows its graph. 4 3 2 1 5 4 3 2 1 1 2 3 4 5 Figure 2: f(x) = (x 1) 2 3 + 2 Why isn t this function differentiable when x = 1? A common answer is that there is no tangent line to the graph of f at the point ( 1, f(1) ) = (1, 2). This response, however, seems strange: as Figure 3a illustrates, there are many tangent lines we can draw at that point. What is the

Differentiation Page 2 difference between the situation when x = 1 and when x equals other points, where the function is differentiable? For example, consider the point x = 2. We readily compute f (x) = 2 3 (x 1) 1 3, so f (2) = 2 3. Figure 3b shows the tangent line drawn at (2, 3). 4 4 3 3 2 2 1 1 5 4 3 2 1 1 2 3 4 5 5 4 3 2 1 1 2 3 4 5 (a) No unique tangent line at (1, 2) (b) The tangent line drawn at (2, 3) Figure 3: A function that is not differentiable at x = 1 but differentiable at x = 2 The difference between the tangent line at (2, 3) and any tangent line we draw at (1, 2) can be explained by how well the tangent lines fit the function at the points in question. For points near x = 2, the tangent line is a very good approximation to the function. For points near x = 1, none of the candidate tangent lines is an especially good approximation to the function. This notion (goodness of fit) can be translated into algebraic terms, and in such a way that the translation generalizes nicely to functions of more than one variable. Single Variable Goodness of Fit The key idea is to look at the it definition of the derivative in the language of approximations. When we write f f(x) f(x (x 0 ) = 0 ) we mean that, if x is sufficiently close to x 0, then f (x 0 ) approximately equals f(x) f(x ( 0) notation: f (x 0 ) f(x) f(x ) 0). Put another way, if x is sufficiently close to x 0, then f(x) f(x 0 )+f (x 0 )( ). Now, y = f(x 0 )+f (x 0 )( ) is a linear function, and it is the equation of the line tangent to the graph of the function f at the point ( x 0, f(x 0 ) ). Using the example of f(x) = (x 1) 2 3 + 2 and x 0 = 2, we compute the equation of the tangent line to be y = 3 + 2 3 (x 2). In general we designate the linear function we get by L(x), so that y = L(x) = f(x 0 ) + f (x 0 )(x x 0 ). When x is near x 0, the function f is only approximated by the function L (unless f is itself a linear function, in which case f would be identical with L). What we need is a way to decide when the approximation is good enough to warrant our dubbing the function with the lofty title of differentiable at x 0. The appropriate decision rule comes from noticing that f(x) L(x) is the difference in y-values between the function and the tangent line at the point x. We designate this difference by e(x), and call e(x) the error term. Figure 4 illustrates this idea using the points (a) x 0 = 1 and (b) x 0 = 2, respectively. The red dotted line is the tangent line, and the black line is the graph of e(x).

Differentiation Page 3 4 4 3 3 2 2 1 1 5 4 3 2 1 1 2 3 4 5 x 0 5 4 3 2 1 1 2 3 4 5 x 0 (a) A poor error function when x 0 = 1 (b) A good error function when x 0 = 2 Figure 4: Error functions (in black) associated with the points (a) x 0 = 1 and (b) x 0 = 2 In Figure 4b, as x x 0 (i.e., as x 2) the error term e(x) approaches zero much more quickly e(x) than does the distance between x and x 0. In other words, = 0. This is not the case when e(x) x 1 in Figure 4a. It can be shown that the stipulation the claim that a function is differentiable. To see how note that, for x x 0, e(x) f(x) L(x) = x x 0 x x 0 = f(x) [f(x 0) + f (x 0 )(x x 0 )] x x 0 = f(x) f(x 0) f (x 0 )(x x 0 ) x x 0 x x 0 = f(x) f(x 0) f (x 0 ), x x 0 = 0 is logically equivalent to and that if f is differentiable at x 0 the last expression will approach zero as x approaches x 0. Thus, a function f : R R is differentiable at the point x 0 just in case there exists a linear e(x) function L : R R such that f(x) = L(x) + e(x), where = 0. In other words, there is a linear function that serves as a very good approximation to the function f for points x near x 0. Exercise 1.1 Find the functions L(x) and e(x) that show f(x) = x 2 is differentiable at the point e(x) x 0 = 3. Be sure to justify your assertions by showing = 0. 2 Derivatives in Higher Dimensions The spade work we did with single variable functions has loosened the soil to the point where we can now see how to define differentiability in higher dimensions. We focus on f : R 2 R and leave as an exercise the corresponding definition for n dimensions. Recall that in R 2 a linear function describes a plane, and has the general form z = L(x, y) = ax + by + c. In what follows we will consider points as vectors drawn from the origin, so that the point (x 0, y 0 ) will be equated with x 0 = x 0, y 0, and the point (x, y) will be equated with x = x, y. Definition 2.1 f : R 2 R is differentiable at x 0 = (x 0, y 0 ) R 2 provided there exists a linear e( x) function L( x) = L(x, y) = ax + by + c such that f( x) = L( x) + e( x), where x x x x 0 = 0. 0

Differentiation Page 4 Geometrically, a function f : R 2 R is differentiable at the point x 0 when there exists a tangent plane that serves as a good approximation to the function f for points x = (x, y) near x 0 = (x 0, y 0 ). With x 0 = (x 0, y 0 ) = (1, 2), Figure 5 shows the tangent plane to the graph of f(x, y) = cos x sin y + 2 at the point ( x 0, y 0, f(x 0, y 0 ) ) = ( 1, 2, f(1, 2) ) (1, 2, 2.5). Figure 5: The tangent plane drawn at ( 1, 2, f(1, 2) ) (1, 2, 2.5) If x = (x, y) is near the point x 0 = (1, 2) then L(x, y) will be close to f(x, y), and so the point ( x, y, L(x, y) ) on the tangent plane will be close to the corresponding point ( x, y, f(x, y) ) on the graph of f. Figure 6 illustrates this fact for a point (x, y) near (1, 2). Following the vertical dashed line up from (x, y) leads first to the point ( x, y, f(x, y) ), which is on the graph of the function. The next point is the approximating point ( x, y, L(x, y) ), which is on the tangent plane. Figure 6: The point ( x, y, L(x, y) ) is close to the point ( x, y, f(x, y) ) Exercise 2.1 Define what it means to say that f : R n R is differentiable at the point x 0 R n.

Differentiation Page 5 In the next section we will see, given a function f : R n R, how to construct the corresponding approximation function L. We close here with a verification that a given linear function L (meant to serve as an approximation to a function f at a particular point) satisfies Definition 2.1. Example 2.1 Show that the function f( x) = f(x, y) = x 2 + 2y 3 + 6 is differentiable at x 0 = (2, 1) by showing that L( x) = L(x, y) = 4x + 6y 2 satisfies the requirements of Definition 2.1. Solution. We compute e( x) 0 x x 0 x x 0 = x x 0 f( x) L( x) x x 0 (x 2 + 2y 3 + 6) (4x + 6y 2) = (x,y) (2,1) (x, y) (2, 1) x 2 4x + 2y 3 6y + 8 = (x,y) (2,1) (x 2) 2 + (y 1) 2 (x 2) 2 + 2(y 1) 2 (y + 2) = (x,y) (2,1) (x 2) 2 + (y 1) 2 (x 2) 2 + 2(y 1) 2 (y + 2) = (x,y) (2,1) (x 2) 2 + (y 1) 2 (x,y) (2,1) (x 2) 2 + 8(y 1) 2 (x 2) 2 + (y 1) 2 8(x 2) 2 + 8(y 1) 2 (x,y) (2,1) (x 2) 2 + (y 1) 2 = (x,y) (2,1) 8 (x 2) 2 + (y 1) 2 = 0. The above inequalities give 0 is to have x x 0 e( x) x x 0 x x 0 (Why can we remove the absolute value signs?) (Provided y is close to 1 Why?) e( x) x x 0 0. The only way this last expression can be satisfied = 0, and this condition is sufficient to guarantee that x x 0 Exercise 2.2 Answer the absolute value question posed in the above solution. Exercise 2.3 Answer the y is close to 1 question posed in the above solution. e( x) x x 0 = 0. Exercise 2.4 Prove that the function f( x) = f(x, y) = 3x 2 + y + 2 is differentiable at x 0 = (2, 1) by showing that L( x) = L(x, y) = 12x + y 10 satisfies the requirements of Definition 2.1.

Differentiation Page 6 3 Finding the Derivative 3.1 Motivation In the last section we defined when a function f : R n R is differentiable at x 0, namely, when there is a linear function L : R n R that is a good approximation to f for points x near x 0. We never said, however, what the derivative was. For functions f : R R, we have f(x) L(x) = f(x 0 ) + f (x 0 )(x x 0 ), (1) and the derivative itself is the quantity f (x 0 ). In this section we seek an analogous expression for L in higher dimensions. We start by assuming a function f : R 2 R is differentiable at x 0. Thus, there is a linear approximation L( x) = L(x, y) = ax+by+c that satisfies the requirements of Definition 2.1. Now, L is the equation of a tangent plane that touches the graph of the function f at the point of tangency. Thus, f(x 0, y 0 ) must equal L(x 0, y 0 ), which in turn equals ax 0 + by 0 + c. Therefore, L(x, y) = (ax 0 + by 0 + c) + (ax + by + c) (ax 0 + by 0 + c), or L(x, y) = f(x 0, y 0 ) + a, b x x 0, y y 0, or L( x) = f( x 0 ) + a, b ( x x 0 ). (2) Equations (1) and (2) jointly clarify what the derivative of a function f : R 2 R should be. It should be a vector. In fact, it should equal the vector a, b, and our aesthetic convictions almost compel us to write f ( x 0 ) = a, b, because then we have from Equation (2) that f( x) L( x) = f( x 0 ) + f ( x 0 ) ( x x 0 ), (3) which is virtually the same form as Equation (1). How sweet it is! 3.2 Calculation Strategy The notational serendipity of Equation (3) is all well and good, but somewhat useless unless we can come up with a convenient way to determine what the vector a, b is for a given function f(x, y) at a particular point x 0 = (x 0, y 0 ), where again we think of this point as also a vector drawn from the origin. Referring to Example (2.1), where L(x, y) = 4x + 6y 2 (and therefore f (2, 1) = 4, 6 ), we seek a way to calculate the vector a, b = 4, 6. According to Definition 2.1, if f : R 2 R is differentiable at x 0, then x x 0 e( x) = 0. (4) x x 0 Keeping in mind that e( x) = f( x) L( x), and using Equation (2), we see that Equation (4) reduces to f( x) f( x 0 ) a, b ( x x 0 ) 0 = x x 0 x x 0 f(x, y) f(x 0, y 0 ) a(x x 0 ) b(y y 0 ) = (x,y) (x 0,y 0 ) (x x0 ) 2 + (y y 0 ) [ 2 ] f(x, y) f(x 0, y 0 ) = (x,y) (x 0,y 0 ) (x x0 ) 2 + (y y 0 ) a(x x 0) + b(y y 0 ). (5) 2 (x x0 ) 2 + (y y 0 ) 2 To find the values of a and b we invoke a clever ploy. Because the it of Expression (5) exists (and equals zero), the it will be zero regardless of the direction that (x, y) approaches (x 0, y 0 ). By picking special directions we can get formulas for the values of a and b. Let s see how.

Differentiation Page 7 First, let (x, y) approach (x 0, y 0 ) in a direction parallel to the x-axis. The y-coordinate is thus fixed at y 0, so (x, y) equals (x, y 0 ). Figure 7 illustrates that, with this restriction, there are only two possible directions of approach. In Figure 7a (x, y 0 ) approaches (x, y) from the right, so x > x 0. In Figure 7b (x, y 0 ) approaches (x, y) from the left, so x < x 0. y-axis y-axis y 0 (x 0, y 0) (x, y 0) y 0 (x, y 0) (x 0, y 0) x 0 x x-axis x x 0 x-axis (a) (x, y 0 ) (x 0, y 0 ) from the right (b) (x, y 0 ) (x 0, y 0 ) from the left Figure 7: Approaching (x 0, y 0 ) along a line parallel to the x-axis Next, recast Equation (5) with (x, y) approaching (x 0, y 0 ) parallel to the x-axis, so (x, y) = (x, y 0 ): [ ] f(x, y 0 ) f(x 0, y 0 ) 0 = (x x0 ) 2 + (y 0 y 0 ) a(x x 0) + b(y 0 y 0 ) 2 (x x0 ) 2 + (y 0 y 0 ) 2 (x,y 0 ) (x 0,y 0 ) = (x,y 0 ) (x 0,y 0 ) [ ] f(x, y 0 ) f(x 0, y 0 ) a(x x 0) (x x0 ) 2 (x x0 ) 2 [ f(x, y0 ) f(x 0, y 0 ) = (x,y 0 ) (x 0,y 0 ) x x 0 a(x x ] 0). (6) x x 0 If (x, y 0 ) approaches (x 0, y 0 ) from the right ( symbolically, (x, y 0 ) (x 0, y 0 ) +), the absolute value x x 0 equals x x 0 because x > x 0. Thus, Equation (6) becomes [ ] f(x, y0 ) f(x 0, y 0 ) 0 = a, (x,y 0 ) (x 0,y 0 ) + x x 0 from which we conclude that f(x, y 0 ) f(x 0, y 0 ) = a. (7) (x,y 0 ) (x 0,y 0 ) + x x 0 If (x, y 0 ) approaches (x 0, y 0 ) from the left (symbolically, (x, y 0 ) (x 0, y 0 ) ), the absolute value x x 0 equals (x x 0 ) because x < x 0. Thus, Equation (6) becomes [ ] f(x, y0 ) f(x 0, y 0 ) 0 = + a, (x,y 0 ) (x 0,y 0 ) (x x 0 ) from which we conclude that f(x, y 0 ) f(x 0, y 0 ) = a. (8) (x,y 0 ) (x 0,y 0 ) x x 0 Finally, combining Equations (7) and (8) reveals a method for computing the coefficient a: a = f(x, y 0 ) f(x 0, y 0 ) f(x, y 0 ) f(x 0, y 0 ) =. (9) (x,y 0 ) (x 0,y 0 ) x x 0 x x 0 Remark. Note the notational change in Equation (9). Because the y-value is always fixed at y 0, only the x-value is changing. In this context the expression is equivalent to. (x,y 0 ) (x 0,y 0 )

Differentiation Page 8 Example 3.1 Apply Equation (9) to the function of Example (2.1). f(x,y Solution. We calculate 0 ) f(x 0,y 0 ) x = = 2 x 2 0 = (x + x 0 ) = 2x 0. Because (x 0, y 0 ) = (2, 1), we deduce that (x + x 0 ) = 2x 0 = 2(2) = 4. Remark. In Example (2.1) the linear function approximating f(x, y) = x 2 + 2y 3 + 6 was given by L(x, y) = ax + by + c = 4x + 6y 2. Thus, a, b = 4, 6, and Equation (9) correctly produced the coefficient a = 4, as advertised. A formula for the coefficient b of Equation (2) can be obtained by running through a similar argument as given on page 7, but by letting (x, y) approach (x 0, y 0 ) along a line parallel to the y-axis. The x-coordinate is thus fixed at x 0, so (x, y) equals (x 0, y). Doing so ultimately gives b = The details are left as an exercise. (x 2 +2y 3 0 +6) (x2 0 +2y3 0 +6) f(x 0, y) f(x 0, y 0 ) f(x 0, y) f(x 0, y 0 ) =. (10) (x 0,y) (x 0,y 0 ) y y 0 y y 0 y y 0 Exercise 3.1 Show that the coefficient b in Equation (2) is given by Equation (10). To do so let (x, y) approach (x 0, y 0 ) along a line parallel to the y-axis, keeping the x-value fixed at x 0. Equations (9) and (10) are so important that they form the basis for the following definitions. f(x,y Definition 3.1 We call 0 ) f(x 0,y 0 ) the partial derivative of f with respect to x evaluated at the point (x 0, y 0 ), and we write f x (x 0, y 0 ) =, which can also be written as f(x,y 0 ) f(x 0,y 0 ) f(x 0 + h, y 0 ) f(x 0, y 0 ) f x (x 0, y 0 ) =. h 0 h f(x Definition 3.2 We call 0,y) f(x 0,y 0 ) y y 0 y y 0 the partial derivative of f with respect to y evaluated at the point (x 0, y 0 ), and we write f y (x 0, y 0 ) =, which can also be written as f(x 0,y) f(x 0,y 0 ) y y 0 y y 0 f(x 0, y 0 + h) f(x 0, y 0 ) f y (x 0, y 0 ) =. h 0 h Remark. There is nothing magic about the point (x 0, y 0 ), and we could just as well talk about the partials evaluated at (x, y), (u, v), or any point in R 2. In other words, f x (x, y) = h 0 f(x + h, y) f(x, y) h f y (x, y) = h 0 f(x, y + h) f(x, y) h and provided, of course, that the its exist. Exercise 3.2 Use Definition 3.2 to verify that, at the point x 0 = (2, 1), the coefficient b for the function of Example (2.1) is indeed equal to 6.

Differentiation Page 9 If z = f(x, y), you will see many of the following notations to indicate partial derivatives: f x (x, y) = z x = f x = x f(x, y) = f 1(x, y) = f (1,0) (x, y) = D 1 f = D x f; f y (x, y) = z y = f y = y f(x, y) = f 2(x, y) = f (0,1) (x, y) = D 2 f = D y f. The good news in doing calculations comes from realizing that partial derivatives are nothing more than derivatives. When we take the partial derivative with respect to x, the y-variable stays fixed, so we can treat it as if it were a constant. Similarly, the x-variable acts like a constant when we compute the partial derivative with respect to y. Example 3.2 Example 3.3 x (x5 y 3 + e xy2 ) = 5x 4 y 3 + y 2 e xy2 ; ( ) x 3 +y 4 x = (x2 y+y 3 x) x 2 y+y 3 x (x3 +y 4 ) (x 3 +y 4 ) x (x 2 y+y 3 x) 2 = (x2 y+y 3 x)(3x 2 ) (x 3 +y 4 )(2xy+y 3 ). (x 2 y+y 3 x) 2 y (x5 y 3 + e xy2 ) = 3x 5 y 2 + 2xye xy2. x (x2 y+y 3 x) Example 3.4 Evaluate f x (0.25, 1.25) and f y (0.25, 1.25) for the function f(x, y) = cos x sin y + 2. Solution. f x (x, y) = sin x sin y, so f x (0.25, 1.25) = sin(0.25) sin(1.25) 0.23; f y (x, y) = cos x cos y, so f y (0.25, 1.25) = cos(0.25) cos(1.25) 0.31. Example 3.5 Use the result of Example (3.4) to find the equation of the plane tangent to the graph of f(x, y) = cos x sin y at the point ( 0.25, 1.25, f(0.25, 1.25) ) (0.25, 1.25, 0.92). Solution. According to Equation (3) the tangent plane is the linear function L given by z = L(x, y) = f(0.25, 1.25) + f (0.25, 1.25) (x 0.25, y 1.25) 0.92 + ( 0.23, 0.31) (x 0.25, y 1.25) [From Example (3.4)] = 0.92 + ( 0.23)(x 0.25) + (0.31)(y 1.25) = 0.23x + 0.31y + 0.59. Exercise 3.3 Find the equation of the plane tangent to the graph of f(x, y) = x 2 + y 2 + x 2 y at the point (2, 3, f(2, 3)) = (2, 3, 25). Exercise 3.4 Find f x Exercise 3.5 Find f x f and y if f(x, y) = xe x2 y 2. ( ) f and y if f(x, y) = sin x y.

Differentiation Page 10 3.3 Geometric Interpretation Imagine the graph a function f : R 2 R, and slice this graph with the plane y = y 0. Figure 8a illustrates this process for the function f(x, y) = 7 x 2 y 2, and the plane y = 1. The plane intersects the graph in a surface that is traced out by the vector valued function r(t) = t, y 0, f(t, y 0 ), for t [a, b]. Figure 8b illustrates this idea for f(x, y) = 7 x 2 y 2, with y 0 = 1. The dark curved line on the surface of the graph is the path traced by r(t) = t, 1, f(t, 1) for 1.7 t 1.7, so r(t) = t, 1, 6 t 2. In general, f x (x 0, y 0 ) is the derivative with respect to x of the function f(x, y 0 ) evaluated at x = x 0. It is also the derivative of the third component of the vector valued function r(t) = t, y 0, f(t, y 0 ) evaluated at t = x 0. In other words, it is the third component of r (x 0 ) = 1, 0, d dt f(t, y 0) t=x0 = 1, 0, fx (x 0, y 0 ). For our specific example, r (t) = 1, 0, 2t, so r (x 0 ) = 1, 0, 2x 0. Now, f x (x, y) = 2x, so if we set (x 0, y 0 ) = ( 1 ( 2, 1), we get f x (x 0, y 0 ) = f 1 x 2, 1) = 1. This value is identical to the third component of r (x 0 ) = r ( 1 2 ) = 1, 0, 1. (a) Slicing the surface with a plane (b) The resulting curve Figure 8: Geometric interpretation of the partial derivative The geometric interpretation of f x ( 1 2, 1) becomes clear by looking at r ( 1 2) = 1, 0, 1. The latter equation tells us that, at the point ( 1 2, 1), every one-unit change in the x direction and zero-unit change in the y direction produces an instantaneous rate of change in the z direction of 1 unit. Similarly, slicing the graph of f by the plane x = 1, produces a curve r(t) = 1, t, f(1, t), as Figure 9 shows. This time we compute r (y 0 ) = 0, 1, d dt f(x 0, t) t=y0 = 0, 1, fy (x 0, y 0 ). For r(t) = 1, t, 6 t 2, we get r (t) = 0, 1, 2t, and the third component of this vector is the partial of f with respect to y evaluated at y = t. If we set (x 0, y 0 ) = (1, 2 5 ), then f ( ) y 1, 2 5 = 4 5, and r ( 2 ( 5) = 0, 1, 4 5. The interpretation is that, at the point 1, 2 5), every zero-unit change in the x direction and one-unit change in the y direction produces an instantaneous rate of change in the z direction of 4 5 units.

Differentiation Page 11 Figure 9: The partial derivative with respect to y evaluated at (x 0, y 0 ) To see an animation of Figures 8b and 9 visit the following link: http://www.westmont.edu/~howell/courses/ma-019/illustrations/derivs/p-deriv.html. 4 Higher Derivatives and Higher Dimensions Partial derivatives can be evaluated at various points, so we can consider them to be functions. For example, if f(x, y) = sin(xy), then f x (x, y) = y cos(xy), so f x : R 2 R. Thus, partial derivatives themselves may have partial derivatives. What notation should we use to indicate the partial derivative with respect to y of the function f x? A natural choice is f xy because reading left to right gives us the partials in the order that they occur. Taking the partial derivative with respect to x of the function f y would be expressed as f yx. The game doesn t stop here. We can compute the partial derivative of the function f xy with respect to x, and so forth. If we write z = f(x, y), we can express these higher derivatives notationally in a variety of ways: f xx (x, y) = 2 z x 2 = 2 f x 2 = D xxf = f (2,0) (x, y); f xy (x, y) = f xyy (x, y) = f xyx (x, y) = 2 z y x = 2 f y x = D xyf = f (1,1) (x, y); 3 z y 2 x = 3 z x y x = 3 f y 2 x = D xyyf = f (1,2) (x, y); 3 f x y x = D xyxf = f (2,1) (x, y). Notice that the higher partial f xy reads left to right, but the symbol 2 f y x reads right to left. The reason for this apparent anomaly is that the expression 2 f y x is really shorthand for y ( f x ), and is read as either, The partial with respect to y of the partial of f with respect to x, or, The second partial of f, first with respect to x, then with respect to y. Notice, also, that there is some ambiguity in the superscript notation. For example, does f (1,1) (x, y) intend to denote f xy or f yx? The two expressions are not always equal, but thanks to the French mathematician Alexis Claude Clairaut (1713 1765), the continuity of f xy and f yx ensures their equality. The following theorem states this fact more precisely.

Differentiation Page 12 Theorem 1 (Clairaut s Theorem) Suppose f xy and f yx are continuous at all points located in a disk centered at the point (a, b). Then f xy (a, b) = f yx (a, b). Exercise 4.1 Verify Clairaut s theorem for f(x, y) = x 5 y 4 + sin(x 2 y) by evaluating the partials. What about functions f : R n R? Happily, everything we ve said up to this point carries over in a very straightforward way. For example, if w = f(x, y, z) = x 5 + x 3 y 4 z 3 + yz 10, then w y = 4x3 y 3 z 3 + z 10. In general, given x 0 R n, and f : R n R, it can be shown that f( x) L( x) = f( x 0 ) + f ( x 0 ) ( x x 0 ), where f ( x 0 ) = f x1 ( x 0 ), f x2 ( x 0 ),..., f xn ( x 0 ), and f xk ( x 0 ) designates the partial derivative with respect to the k th coordinate evaluated at x 0. Example 4.1 Find f (x, y, z) if f(x, y, z) = x y 3 +z 4. Solution. f (x, y, z) = f x, f y, f z = 1, 3xy2,. 4xz3 y 3 +z 4 (y 3 +z 4 ) 2 (y 3 +z 4 ) 2 Remark. Many texts use the symbol f( x) to designate the vector of partial derivatives, and f is called the gradient of f. Thus, in the last example we would write 1 f(x, y, z) = y 3 + z 4, 3xy2 (y 3 + z 4 ) 2, 4xz3 (y 3 + z 4 ) 2. It is important, however, not to equate f( x) with f ( x). Certainly, if a function f has a derivative at the point x 0, then all its partial derivatives exist at x 0. However, the existence of all the partials at x 0 is not sufficient to guarantee that the function f is differentiable. Can you describe what such a situation would look like geometrically? Figure 10 illustrates a function f(x, y) that is not differentiable at (0, 0) even though its partial derivatives exist at that point. Its graph is shaped like a smooth spherical hill, except there is a cataclysmic drop to a flat plateau right along the x- and y-axes in the direction of the first quadrant. (a) The function itself (b) The function and tangent plane at (0, 0, 1000) Figure 10: A non-differentiable function at (0, 0) whose partials exist there If you were walking on this hill along the ridge {( x, 0, f(x, 0) ) : x > 0 }, all would be fine as long as your y-coordinate remained at 0. If, however, you took a tiny step in the positive y-direction, well, on the way down shout out to your friends, Call 911! The ridges defined by {( x, 0, f(x, 0) ) : x 0 } and {( 0, y, f(0, y) ) : y 0 } are nice and smooth, so f x (0, 0) and f y (0, 0) clearly exist. But f is not differentiable at (0, 0) because there is

Differentiation Page 13 no good linear (i.e., tangent plane) approximation to f at that point. To see why, assume that the height of the graph of f at (0, 0) is 1000 feet. The only possible tangent plane at the point (0, 0, 1000) is clearly parallel to the xy plane, so it has the equation z = 1000. As Figure 10b illustrates, this plane is a good approximation to the function z = f(x, y) for points (x, y) near (0, 0) unless (x, y) is in the first quadrant. In that case, the tangent plane, whose z-values equal 1000, is a miserable approximation to the function z = f(x, y), whose z-values equal the height of the plateau. Exercise 4.2 What do f x (0, 0) and f y (0, 0) equal for the function illustrated in Figure 10a? Exercise 4.3 The function in Figure 10a is discontinuous at (0, 0). Explain why and then identify any other points at which the function is discontinuous. Finally, sketch a function that is continuous at (0, 0), yet whose partial derivatives do not exist at that point. Explain. The following theorem tells us when we can be sure that a function is differentiable. Theorem 2 Suppose f : R n R, and all the partial derivatives exist and are continuous at x 0. Then f is differentiable at x 0. 5 Directional Derivatives Let s take stock of what we ve learned so far. A function f : R n R is differentiable at x 0 R n if and only if there exists a linear function L : R n R that is a good approximation to the function f for points x near x 0. By saying that L is a good approximation to f we mean that x x 0 e( x) x x 0 = 0, where e( x) = f( x) L( x). In other words, the error term (that measures the difference between f and L) approaches zero much faster than does the distance between x and x 0. The linear function L has the form L( x) = f( x 0 ) + m ( x x 0 ), and the vector m has the partial derivatives of f as its components. We define the derivative of f at the point x 0, therefore, to be that vector. Thus, m = f ( x 0 ) = f x1 ( x 0 ), f x2 ( x 0 ),..., f xn ( x 0 ). Thus, if a function f : R n R is differentiable at x 0 we know that f( x) L( x) = f( x 0 ) + f ( x 0 ) ( x x 0 ), where f ( x 0 ) = f x1 ( x 0 ), f x2 ( x 0,..., fxn ( x 0 ). e( x) x x 0 For functions of two variables Equation (5) showed that the expression = 0 is the [ ] x x 0 f(x,y) f(x same as 0,y 0 ) a()+b(y y 0 ) = 0, where f ( x 0 ) = a, b. The partial (x,y) (x 0,y 0 ) (x x0 ) 2 +(y y 0 ) 2 (x x0 ) 2 +(y y 0 ) 2 derivatives we looked at in the last section grew out of evaluating these its as (x, y) approached (x 0, y 0 ) along straight lines parallel to the x- and y-axes. Of course, there are other straight line directions that we can look at. What would happen in R 2 if x = (x, y) were to approach x 0 = (x 0, y 0 ) along a direction of some arbitrary unit vector, say u = (u 1, u 2 )? In this case, we could write x as x 0 + h u = (x 0, y 0 ) + h(u 1, u 2 ) = (x 0 + hu 1, y 0 + hu 2 ), and Equation (5) would then become h 0 [ f(x 0 + hu 1, y 0 + hu 2 ) f(x 0, y 0 ) (hu1 ) 2 + (hu 2 ) 2 a(hu 1) + b(hu 2 ) (hu1 ) 2 + (hu 2 ) 2 ] = 0, or h 0 [ f( x 0 + h u) f( x 0 ) (hu1 ) 2 + (hu 2 ) 2 a(hu 1) + b(hu 2 ) (hu1 ) 2 + (hu 2 ) 2 ] = 0, (11)

Differentiation Page 14 where as before a = f x (x 0, y 0 ) and b = f y (x 0, y 0 ). Letting h approach zero from the right and then the left will reveal that Equation (11) implies f( x 0 + h u) f( x 0 ) = a, b u. (12) h 0 h Exercise 5.1 Show that Equation (12) is a consequence of Equation (11) by analyzing what happens when h approaches zero from the right and then from the left. Partial derivatives are merely special cases of directional derivatives. For partial derivatives the direction of approach is either along the x-axis or y-axis. As with partial derivatives there are a variety of notations for directional derivatives: f( x 0 + h u) f( x 0 ) = f u ( x 0 ) = f h 0 h u = D uf. Our findings apply to functions of more than two variables as well. Below we formalize all of the above with a definition and theorem that apply to functions f : R n R. Definition 5.1 If f : R n R, the quantity f u ( x 0 ) = f( x 0 +h u) f( x 0 ) h 0 h is called the directional derivative of f with respect to the unit vector u evaluated at x 0, provided the it exists. Theorem 3 If f : R n R is differentiable at x 0, then the directional derivative of f exists in the direction of any unit vector u, and f u ( x 0 ) = f x1 ( x 0 ), f x2 ( x 0 ),..., f xn ( x 0 ) u. Figure 11 shows: (1) f (x, y) = x 4 3 +y 2 +2; (2) the point x 0 = (x 0, y 0 ) = ( 1 2, 5) 1 ; (3) the tangent plane (linear function) approximation to f at the point ( x 0, y 0, f(x 0, y 0 ) ) ( 1 2, 1 5, 2.4) ; (4) the unit vector u = 3 5, 5 4 drawn from x0 ; (5) f( x 0 ) 1.06, 0.4. (Recall that f, also called the gradient, is the vector of partial derivatives.) The displacement on the tangent plane directly above the points x 0 and x 0 + u is illustrated by the dark slanted line. The slope of this line is f u ( x 0 ). Since u is a unit vector, the slope is the distance represented by the dark vertical line directly above the point x 0 + u. In this case we compute f u ( x 0 ) 1.06, 0.4 3 5, 5 4 = 0.956. The value 0.956 indicates that, at the instant when x 0 = ( 1 2, 5) 1, every unit change in the direction of u = 3 5, 4 5 produces 0.956 units of change in the z-value. Alternatively, we could say that, at the instant when x 0 = ( 1 2, 1 ) 5, every 3 5 units of change in x and 4 5 units of change in y produces 0.956 units of change in the z-value. Figure 11: A directional derivative The circular segment drawn at the top of the tangent plane maps out where different displacement vectors above x 0 + u would appear for unit vectors u = (cos θ, sin θ), where 0 θ π 2. To see an animation of Figure 11 visit the following link: http://www.westmont.edu/~howell/courses/ma-019/illustrations/derivs/d-deriv.html. The maximum value of f u ( x 0 ) seems to occur when u happens to be in the same direction as f( x 0 ). This phenomenon is always the case, and explains why f is called the gradient of f. Theorem 4 formalizes this result.

Differentiation Page 15 Theorem 4 Suppose f : R n R is differentiable at x 0. The maximum value for f u ( x 0 ) occurs when u is in the same direction as f( x 0 ) = f x ( x 0 ), f y ( x 0 ), and the minimum value for f u ( x 0 ) occurs when u is in the same direction as f( x 0 ). Proof. f u ( x 0 ) = f( x 0 ) u = f( x 0 ) u cos θ = f( x 0 ) cos θ, where θ is the angle between f( x 0 ) and u. Now f( x 0 ) is constant, so the product f( x 0 ) cos θ is maximized when θ = 0, and minimized when θ = π. Exercise 5.2 Rhett has so far been unable to impress anyone with his macho feats, so he has come up with a new scheme to show off his manhood. He gets everyone s attention and then jumps barefoot onto a flat bed of burning coals, landing at the point (1, 2), measured in feet. The temperature (degrees Fahrenheit) at any point (a, b) on the bed of coals equals 200 a 3 b a 4 b 2, so he quickly senses the folly of his actions. In what direction should he move so as to have the temperature decrease most rapidly? At what rate will the temperature change if Rhett leaves the point (1, 2) and heads directly towards the point (3, 5)? Make sure your answer contains appropriate units of measurement. Exercise 5.3 Consider the function shown in Figure 10, and unit vectors u = cos θ, sin θ. For what values of θ does f u (0, 0) exist, and what does it equal? Explain your reasoning. 6 The Multivariable Chain Rule 6.1 Introduction For functions of one variable, f (c) is the slope of the line that is tangent to the graph of the function f at the point ( c, f(c) ). The instantaneous rate of change perspective of the derivative focuses on how fast the function f is changing at a particular point. For example, if f (c) = 4, we can say that, at the instant when x = c, the y value is changing four times as fast as the x value. If the functionf were to continue to behave as it is at this particular instant, every unit change in x from the point x = c would produce four units of change in the function f. The chain rule tells us how to differentiate a function h that is defined as a chain of two functions, that is, h(x) = f ( g(x) ) = (f g)(x). Figure 12 illustrates this concept. R R R g f c g(c) h(c) = f ( g(c) ) h(x) = f ( g(x) ) = (f g)(x) Figure 12: The single variable chain rule Suppose that g (c) = 2, andf ( g(c) ) = 3. What could we conclude about the value of h (c)? Using the instantaneous rate of change perspective of the derivative, we interpret g (c) = 2 to mean that every unit change in x from the point x = c produces two units of change in the function g. Also, f ( g(c) ) = 3 means that every unit change in x from the point x = g(c) produces three units of change in the function f. It stands to reason, then, that every unit change in x from the point x = c produces g (c)f (g(c)) = (2)(3) = 6 units change in the function h. Thus, h (c) = g (c)f ( g(c) ), or h (c) = f ( g(c) ) g (c).

Differentiation Page 16 Exercise 6.1 For h(x) = ( 4x 3 + 2x + 7 ) 2 = f(g(x)), identify the following: a. f(x); b. g(x); c. f (13); d. g(1); e. g (1); f. h (1). Exercise 6.2 If f(x) = x 3 and g(x) = 3x + 4, evaluate f ( g(x) ) and g ( f(x) ). 6.2 Higher Dimensions The multivariable chain rule is remarkably similar to its single-variable counterpart. Consider the function h(t) = f ( g(t) ), where g : R R 2, and f : R 2 R, as shown in Figure 13. The middle circle illustrates the output of the function g in red, with the point g(c) represented by a blue dot. The blue vector is g (c). It is depicted as being 2 units long by hash marks, so that g (c) = 2. R R 2 R g f y g(t) g(c) c x h(c) = f ( g(c) ) h(t) = f ( g(t) ) = (f g)(t) Figure 13: The multivariable chain rule Let u be the unit vector in the direction of g (c). That is, u = g g (c). The first two circles in Figure 13 illustrate that every unit change from the point c produces g (c) = 2 units of (instantaneous) change in the function g in the direction of u. As the second and third circles in the figure illustrate, every unit change from the point g(c) in the direction of u produces three units change ( in the ) function f. This (instantaneous) change is computed by the directional derivative, i.e., f u g(c) = 3. Obviously, then, every unit change from the point c produces g ( ) (c) f u g(c) = (2)(3) = 6 units of (instantaneous) change in the function h. That is, Now, according to Theorem 3, Combining Equations (13) and (14) gives h (c) = g (c) f u ( g(c) ). (13) f u ( g(c)) = f ( g(c) ) u = f ( g(c) ) g (c) g (c). (14) h (c) = g (c) f u ( g(c) ) = g (c) f ( g(c) ) g (c) g (c) = f ( g(c) ) g (c). The following theorem summarizes these observations. Theorem 5 Suppose g : R R n is differentiable at c, and f : R n R is differentiable at g(c). Then h(t) = f ( g(t) ) is differentiable at c, and h (c) = f ( g(c) ) g (c). On the following page we trace through this logic again, but with a concrete example.

Differentiation Page 17 We begin with a chain of functions: h(t) = f ( g(t) ), where g(t) = 2.5t 3 + 7.2t 2 5.9t + 1.7, 4 3 t 17, f(x, y) = x 4 3 + y 2 + 2, and c = 1. 15 With c = 1 we compute g(c) = 0.5, 0.2, g (c) = 1, 4 3, and g (c) = 5 3. The unit vector in the direction of g (c) is u = g (c) g (c) = 3 5, 4 5. Now, every unit change in t from the point t = c produces g (c) = 5 3 units change in the function g in the direction of g (c). But every unit change from the point g(c) in the direction of g ( ) (c) produces f u g(c) units change in the function f, which equals f u ( g(c) ) = f ( g(c) ) u = f ( g(c) ) g (c) g (c). Hence, the total instantaneous change in the function h (per every unit change from the point t = c) is [ h (c) = f ( g(c) ) g ] (c) g g (c) = f (c) ( g(c) ) g (c), which is what the multivariable chain rule stipulates. For our given functions, we compute f (x, y) = 4 3 x 1 3, 2y, so f ( g(c) ) = f ( g(1) ) = f (0.5, 0.2) 1.06, 0.4. Figure 14: The chain rule Combining all this information gives a numerical value for the derivative: h (c) = h (1) = f ( g(1) ) g (1) 1.06, 0.4 1, 4 1.59. 3 Thus, every unit change from the point c = 1 on the t-axis produces a change of 1.59 units on the z-axis. In other words, the instantaneous rate of change in the function h from the point h(1) = f ( g(1) ) amounts to 1.59 units for every unit change from the point c = 1 on the t-axis. The length of the vertical yellow line directly above the vector g(c) + g (c) in Figure 14, then, must equal 1.59 units. An enlarged web illustration of these ideas can be found at the following link: http://www.westmont.edu/~howell/courses/ma-019/illustrations/derivs/chain-rule.pdf. 6.3 Leibniz Notation The Leibniz notation for h (t) is similar to that for single variable calculus. We let z represent the output of the function f and also consider z as a function of the variable t. We then have f ( g(t) ) = f (x, y) = f x, f y = z x, z y and g (t) = dx dt, dy dt, so that dz z dt = x, z dx y dt, dy = z dx dt x dt + z dy y dt. Most texts write the above formula without the dot product, so that it reads as follows: dz dt = z dx x dt + z dy y dt. Great care must be taken in using this approach, however, as there are weaknesses in Leibniz notation: the functions chained together are obscured, it is not clear at what points the derivatives

Differentiation Page 18 are evaluated, and the z-variable must play a dual role ( as a direct output of both h(t) and f(x, y) ). For the functions of Figure 14 we have x, y = g(t) = 2.5t 3 + 7.2t 2 5.9t + 1.7, 4 3 t 17 15 and z = f(x, y) = x 4 3 + y 2 + 2. Thus, dx dt = 7.5t 2 + 14.4t 5.9, dy dt = 4 3, z x = 4 3 x 1 3, and z y = 2y. Combining this information gives dz dt = ( 4 3 x 1 3 )( 7.5t 2 + 14.4t 5.9) + (2y)( 4 3 ), which we evaluate at t = 1, and (x, y) = (0.5, 0.2). Do you see why we evaluate the resulting expression at these points? An advantage of the Leibniz notation is that it more easily reveals how to piece the various functions of a chain together. Suppose that w is the output of a function of three variables, say w = f(x, y, z), and that the vector x, y, z is the output of a function g(s, t) : R 2 R 3, so that g(s, t) = x, y, z. Then w also depends on s and t because w = f ( g(s, t) ), and w s = w x x s + w y y s + w z z s, w t = w x x t + w y y t + w z z t.