Differentiation of Multivariable Functions

Differentiation of Multivariable Functions 1 Introduction Beginning calculus students identify the derivative of a function either in terms of slope or instantaneous rate of change. When thinking of the former they say something like, f (a) is the slope of the line that is tangent to the graph of the function f at the point ( a, f(a) ). Can this idea be extended to functions f : R n R? For example, Figure 1 shows the graph of f(x, y) = cos x sin y + 2 for (x, y) [ π, π] [ π, π]. The point ( 1, 2, f(1, 2) ) (1, 2, 2.5) is represented by the dark dot on the graph. What would it mean to talk about a slope at that point? Figure 1: What is the slope at ( 1, 2, f(1, 2) ) (1, 2, 2.5)? Answering this question might seem like a tough row to hoe. To make some headway we will revisit the derivative of elementary calculus and see if we can formulate it in a way that might apply to multivariable functions as well. We begin by looking at functions that are not differentiable. Consider the function f(x) = (x 1) 2 3 + 2, where 5 x 5. Figure 2 shows its graph. 4 3 2 1 5 4 3 2 1 1 2 3 4 5 Figure 2: f(x) = (x 1) 2 3 + 2 Why isn t this function differentiable when x = 1? A common answer is that there is no tangent line to the graph of f at the point ( 1, f(1) ) = (1, 2). This response, however, seems strange: as Figure 3a illustrates, there are many tangent lines we can draw at that point. What is the

Differentiation Page 2 difference between the situation when x = 1 and when x equals other points, where the function is differentiable? For example, consider the point x = 2. We readily compute f (x) = 2 3 (x 1) 1 3, so f (2) = 2 3. Figure 3b shows the tangent line drawn at (2, 3). 4 4 3 3 2 2 1 1 5 4 3 2 1 1 2 3 4 5 5 4 3 2 1 1 2 3 4 5 (a) No unique tangent line at (1, 2) (b) The tangent line drawn at (2, 3) Figure 3: A function that is not differentiable at x = 1 but differentiable at x = 2 The difference between the tangent line at (2, 3) and any tangent line we draw at (1, 2) can be explained by how well the tangent lines fit the function at the points in question. For points near x = 2, the tangent line is a very good approximation to the function. For points near x = 1, none of the candidate tangent lines is an especially good approximation to the function. This notion (goodness of fit) can be translated into algebraic terms, and in such a way that the translation generalizes nicely to functions of more than one variable. Single Variable Goodness of Fit The key idea is to look at the it definition of the derivative in the language of approximations. When we write f f(x) f(x (x 0 ) = 0 ) we mean that, if x is sufficiently close to x 0, then f (x 0 ) approximately equals f(x) f(x ( 0) notation: f (x 0 ) f(x) f(x ) 0). Put another way, if x is sufficiently close to x 0, then f(x) f(x 0 )+f (x 0 )( ). Now, y = f(x 0 )+f (x 0 )( ) is a linear function, and it is the equation of the line tangent to the graph of the function f at the point ( x 0, f(x 0 ) ). Using the example of f(x) = (x 1) 2 3 + 2 and x 0 = 2, we compute the equation of the tangent line to be y = 3 + 2 3 (x 2). In general we designate the linear function we get by L(x), so that y = L(x) = f(x 0 ) + f (x 0 )(x x 0 ). When x is near x 0, the function f is only approximated by the function L (unless f is itself a linear function, in which case f would be identical with L). What we need is a way to decide when the approximation is good enough to warrant our dubbing the function with the lofty title of differentiable at x 0. The appropriate decision rule comes from noticing that f(x) L(x) is the difference in y-values between the function and the tangent line at the point x. We designate this difference by e(x), and call e(x) the error term. Figure 4 illustrates this idea using the points (a) x 0 = 1 and (b) x 0 = 2, respectively. The red dotted line is the tangent line, and the black line is the graph of e(x).

Differentiation Page 3 4 4 3 3 2 2 1 1 5 4 3 2 1 1 2 3 4 5 x 0 5 4 3 2 1 1 2 3 4 5 x 0 (a) A poor error function when x 0 = 1 (b) A good error function when x 0 = 2 Figure 4: Error functions (in black) associated with the points (a) x 0 = 1 and (b) x 0 = 2 In Figure 4b, as x x 0 (i.e., as x 2) the error term e(x) approaches zero much more quickly e(x) than does the distance between x and x 0. In other words, = 0. This is not the case when e(x) x 1 in Figure 4a. It can be shown that the stipulation the claim that a function is differentiable. To see how note that, for x x 0, e(x) f(x) L(x) = x x 0 x x 0 = f(x) [f(x 0) + f (x 0 )(x x 0 )] x x 0 = f(x) f(x 0) f (x 0 )(x x 0 ) x x 0 x x 0 = f(x) f(x 0) f (x 0 ), x x 0 = 0 is logically equivalent to and that if f is differentiable at x 0 the last expression will approach zero as x approaches x 0. Thus, a function f : R R is differentiable at the point x 0 just in case there exists a linear e(x) function L : R R such that f(x) = L(x) + e(x), where = 0. In other words, there is a linear function that serves as a very good approximation to the function f for points x near x 0. Exercise 1.1 Find the functions L(x) and e(x) that show f(x) = x 2 is differentiable at the point e(x) x 0 = 3. Be sure to justify your assertions by showing = 0. 2 Derivatives in Higher Dimensions The spade work we did with single variable functions has loosened the soil to the point where we can now see how to define differentiability in higher dimensions. We focus on f : R 2 R and leave as an exercise the corresponding definition for n dimensions. Recall that in R 2 a linear function describes a plane, and has the general form z = L(x, y) = ax + by + c. In what follows we will consider points as vectors drawn from the origin, so that the point (x 0, y 0 ) will be equated with x 0 = x 0, y 0, and the point (x, y) will be equated with x = x, y. Definition 2.1 f : R 2 R is differentiable at x 0 = (x 0, y 0 ) R 2 provided there exists a linear e( x) function L( x) = L(x, y) = ax + by + c such that f( x) = L( x) + e( x), where x x x x 0 = 0. 0

Differentiation Page 4 Geometrically, a function f : R 2 R is differentiable at the point x 0 when there exists a tangent plane that serves as a good approximation to the function f for points x = (x, y) near x 0 = (x 0, y 0 ). With x 0 = (x 0, y 0 ) = (1, 2), Figure 5 shows the tangent plane to the graph of f(x, y) = cos x sin y + 2 at the point ( x 0, y 0, f(x 0, y 0 ) ) = ( 1, 2, f(1, 2) ) (1, 2, 2.5). Figure 5: The tangent plane drawn at ( 1, 2, f(1, 2) ) (1, 2, 2.5) If x = (x, y) is near the point x 0 = (1, 2) then L(x, y) will be close to f(x, y), and so the point ( x, y, L(x, y) ) on the tangent plane will be close to the corresponding point ( x, y, f(x, y) ) on the graph of f. Figure 6 illustrates this fact for a point (x, y) near (1, 2). Following the vertical dashed line up from (x, y) leads first to the point ( x, y, f(x, y) ), which is on the graph of the function. The next point is the approximating point ( x, y, L(x, y) ), which is on the tangent plane. Figure 6: The point ( x, y, L(x, y) ) is close to the point ( x, y, f(x, y) ) Exercise 2.1 Define what it means to say that f : R n R is differentiable at the point x 0 R n.

Differentiation Page 5 In the next section we will see, given a function f : R n R, how to construct the corresponding approximation function L. We close here with a verification that a given linear function L (meant to serve as an approximation to a function f at a particular point) satisfies Definition 2.1. Example 2.1 Show that the function f( x) = f(x, y) = x 2 + 2y 3 + 6 is differentiable at x 0 = (2, 1) by showing that L( x) = L(x, y) = 4x + 6y 2 satisfies the requirements of Definition 2.1. Solution. We compute e( x) 0 x x 0 x x 0 = x x 0 f( x) L( x) x x 0 (x 2 + 2y 3 + 6) (4x + 6y 2) = (x,y) (2,1) (x, y) (2, 1) x 2 4x + 2y 3 6y + 8 = (x,y) (2,1) (x 2) 2 + (y 1) 2 (x 2) 2 + 2(y 1) 2 (y + 2) = (x,y) (2,1) (x 2) 2 + (y 1) 2 (x 2) 2 + 2(y 1) 2 (y + 2) = (x,y) (2,1) (x 2) 2 + (y 1) 2 (x,y) (2,1) (x 2) 2 + 8(y 1) 2 (x 2) 2 + (y 1) 2 8(x 2) 2 + 8(y 1) 2 (x,y) (2,1) (x 2) 2 + (y 1) 2 = (x,y) (2,1) 8 (x 2) 2 + (y 1) 2 = 0. The above inequalities give 0 is to have x x 0 e( x) x x 0 x x 0 (Why can we remove the absolute value signs?) (Provided y is close to 1 Why?) e( x) x x 0 0. The only way this last expression can be satisfied = 0, and this condition is sufficient to guarantee that x x 0 Exercise 2.2 Answer the absolute value question posed in the above solution. Exercise 2.3 Answer the y is close to 1 question posed in the above solution. e( x) x x 0 = 0. Exercise 2.4 Prove that the function f( x) = f(x, y) = 3x 2 + y + 2 is differentiable at x 0 = (2, 1) by showing that L( x) = L(x, y) = 12x + y 10 satisfies the requirements of Definition 2.1.

Differentiation Page 6 3 Finding the Derivative 3.1 Motivation In the last section we defined when a function f : R n R is differentiable at x 0, namely, when there is a linear function L : R n R that is a good approximation to f for points x near x 0. We never said, however, what the derivative was. For functions f : R R, we have f(x) L(x) = f(x 0 ) + f (x 0 )(x x 0 ), (1) and the derivative itself is the quantity f (x 0 ). In this section we seek an analogous expression for L in higher dimensions. We start by assuming a function f : R 2 R is differentiable at x 0. Thus, there is a linear approximation L( x) = L(x, y) = ax+by+c that satisfies the requirements of Definition 2.1. Now, L is the equation of a tangent plane that touches the graph of the function f at the point of tangency. Thus, f(x 0, y 0 ) must equal L(x 0, y 0 ), which in turn equals ax 0 + by 0 + c. Therefore, L(x, y) = (ax 0 + by 0 + c) + (ax + by + c) (ax 0 + by 0 + c), or L(x, y) = f(x 0, y 0 ) + a, b x x 0, y y 0, or L( x) = f( x 0 ) + a, b ( x x 0 ). (2) Equations (1) and (2) jointly clarify what the derivative of a function f : R 2 R should be. It should be a vector. In fact, it should equal the vector a, b, and our aesthetic convictions almost compel us to write f ( x 0 ) = a, b, because then we have from Equation (2) that f( x) L( x) = f( x 0 ) + f ( x 0 ) ( x x 0 ), (3) which is virtually the same form as Equation (1). How sweet it is! 3.2 Calculation Strategy The notational serendipity of Equation (3) is all well and good, but somewhat useless unless we can come up with a convenient way to determine what the vector a, b is for a given function f(x, y) at a particular point x 0 = (x 0, y 0 ), where again we think of this point as also a vector drawn from the origin. Referring to Example (2.1), where L(x, y) = 4x + 6y 2 (and therefore f (2, 1) = 4, 6 ), we seek a way to calculate the vector a, b = 4, 6. According to Definition 2.1, if f : R 2 R is differentiable at x 0, then x x 0 e( x) = 0. (4) x x 0 Keeping in mind that e( x) = f( x) L( x), and using Equation (2), we see that Equation (4) reduces to f( x) f( x 0 ) a, b ( x x 0 ) 0 = x x 0 x x 0 f(x, y) f(x 0, y 0 ) a(x x 0 ) b(y y 0 ) = (x,y) (x 0,y 0 ) (x x0 ) 2 + (y y 0 ) [ 2 ] f(x, y) f(x 0, y 0 ) = (x,y) (x 0,y 0 ) (x x0 ) 2 + (y y 0 ) a(x x 0) + b(y y 0 ). (5) 2 (x x0 ) 2 + (y y 0 ) 2 To find the values of a and b we invoke a clever ploy. Because the it of Expression (5) exists (and equals zero), the it will be zero regardless of the direction that (x, y) approaches (x 0, y 0 ). By picking special directions we can get formulas for the values of a and b. Let s see how.

Differentiation Page 7 First, let (x, y) approach (x 0, y 0 ) in a direction parallel to the x-axis. The y-coordinate is thus fixed at y 0, so (x, y) equals (x, y 0 ). Figure 7 illustrates that, with this restriction, there are only two possible directions of approach. In Figure 7a (x, y 0 ) approaches (x, y) from the right, so x > x 0. In Figure 7b (x, y 0 ) approaches (x, y) from the left, so x < x 0. y-axis y-axis y 0 (x 0, y 0) (x, y 0) y 0 (x, y 0) (x 0, y 0) x 0 x x-axis x x 0 x-axis (a) (x, y 0 ) (x 0, y 0 ) from the right (b) (x, y 0 ) (x 0, y 0 ) from the left Figure 7: Approaching (x 0, y 0 ) along a line parallel to the x-axis Next, recast Equation (5) with (x, y) approaching (x 0, y 0 ) parallel to the x-axis, so (x, y) = (x, y 0 ): [ ] f(x, y 0 ) f(x 0, y 0 ) 0 = (x x0 ) 2 + (y 0 y 0 ) a(x x 0) + b(y 0 y 0 ) 2 (x x0 ) 2 + (y 0 y 0 ) 2 (x,y 0 ) (x 0,y 0 ) = (x,y 0 ) (x 0,y 0 ) [ ] f(x, y 0 ) f(x 0, y 0 ) a(x x 0) (x x0 ) 2 (x x0 ) 2 [ f(x, y0 ) f(x 0, y 0 ) = (x,y 0 ) (x 0,y 0 ) x x 0 a(x x ] 0). (6) x x 0 If (x, y 0 ) approaches (x 0, y 0 ) from the right ( symbolically, (x, y 0 ) (x 0, y 0 ) +), the absolute value x x 0 equals x x 0 because x > x 0. Thus, Equation (6) becomes [ ] f(x, y0 ) f(x 0, y 0 ) 0 = a, (x,y 0 ) (x 0,y 0 ) + x x 0 from which we conclude that f(x, y 0 ) f(x 0, y 0 ) = a. (7) (x,y 0 ) (x 0,y 0 ) + x x 0 If (x, y 0 ) approaches (x 0, y 0 ) from the left (symbolically, (x, y 0 ) (x 0, y 0 ) ), the absolute value x x 0 equals (x x 0 ) because x < x 0. Thus, Equation (6) becomes [ ] f(x, y0 ) f(x 0, y 0 ) 0 = + a, (x,y 0 ) (x 0,y 0 ) (x x 0 ) from which we conclude that f(x, y 0 ) f(x 0, y 0 ) = a. (8) (x,y 0 ) (x 0,y 0 ) x x 0 Finally, combining Equations (7) and (8) reveals a method for computing the coefficient a: a = f(x, y 0 ) f(x 0, y 0 ) f(x, y 0 ) f(x 0, y 0 ) =. (9) (x,y 0 ) (x 0,y 0 ) x x 0 x x 0 Remark. Note the notational change in Equation (9). Because the y-value is always fixed at y 0, only the x-value is changing. In this context the expression is equivalent to. (x,y 0 ) (x 0,y 0 )

Differentiation Page 8 Example 3.1 Apply Equation (9) to the function of Example (2.1). f(x,y Solution. We calculate 0 ) f(x 0,y 0 ) x = = 2 x 2 0 = (x + x 0 ) = 2x 0. Because (x 0, y 0 ) = (2, 1), we deduce that (x + x 0 ) = 2x 0 = 2(2) = 4. Remark. In Example (2.1) the linear function approximating f(x, y) = x 2 + 2y 3 + 6 was given by L(x, y) = ax + by + c = 4x + 6y 2. Thus, a, b = 4, 6, and Equation (9) correctly produced the coefficient a = 4, as advertised. A formula for the coefficient b of Equation (2) can be obtained by running through a similar argument as given on page 7, but by letting (x, y) approach (x 0, y 0 ) along a line parallel to the y-axis. The x-coordinate is thus fixed at x 0, so (x, y) equals (x 0, y). Doing so ultimately gives b = The details are left as an exercise. (x 2 +2y 3 0 +6) (x2 0 +2y3 0 +6) f(x 0, y) f(x 0, y 0 ) f(x 0, y) f(x 0, y 0 ) =. (10) (x 0,y) (x 0,y 0 ) y y 0 y y 0 y y 0 Exercise 3.1 Show that the coefficient b in Equation (2) is given by Equation (10). To do so let (x, y) approach (x 0, y 0 ) along a line parallel to the y-axis, keeping the x-value fixed at x 0. Equations (9) and (10) are so important that they form the basis for the following definitions. f(x,y Definition 3.1 We call 0 ) f(x 0,y 0 ) the partial derivative of f with respect to x evaluated at the point (x 0, y 0 ), and we write f x (x 0, y 0 ) =, which can also be written as f(x,y 0 ) f(x 0,y 0 ) f(x 0 + h, y 0 ) f(x 0, y 0 ) f x (x 0, y 0 ) =. h 0 h f(x Definition 3.2 We call 0,y) f(x 0,y 0 ) y y 0 y y 0 the partial derivative of f with respect to y evaluated at the point (x 0, y 0 ), and we write f y (x 0, y 0 ) =, which can also be written as f(x 0,y) f(x 0,y 0 ) y y 0 y y 0 f(x 0, y 0 + h) f(x 0, y 0 ) f y (x 0, y 0 ) =. h 0 h Remark. There is nothing magic about the point (x 0, y 0 ), and we could just as well talk about the partials evaluated at (x, y), (u, v), or any point in R 2. In other words, f x (x, y) = h 0 f(x + h, y) f(x, y) h f y (x, y) = h 0 f(x, y + h) f(x, y) h and provided, of course, that the its exist. Exercise 3.2 Use Definition 3.2 to verify that, at the point x 0 = (2, 1), the coefficient b for the function of Example (2.1) is indeed equal to 6.

Differentiation Page 9 If z = f(x, y), you will see many of the following notations to indicate partial derivatives: f x (x, y) = z x = f x = x f(x, y) = f 1(x, y) = f (1,0) (x, y) = D 1 f = D x f; f y (x, y) = z y = f y = y f(x, y) = f 2(x, y) = f (0,1) (x, y) = D 2 f = D y f. The good news in doing calculations comes from realizing that partial derivatives are nothing more than derivatives. When we take the partial derivative with respect to x, the y-variable stays fixed, so we can treat it as if it were a constant. Similarly, the x-variable acts like a constant when we compute the partial derivative with respect to y. Example 3.2 Example 3.3 x (x5 y 3 + e xy2 ) = 5x 4 y 3 + y 2 e xy2 ; ( ) x 3 +y 4 x = (x2 y+y 3 x) x 2 y+y 3 x (x3 +y 4 ) (x 3 +y 4 ) x (x 2 y+y 3 x) 2 = (x2 y+y 3 x)(3x 2 ) (x 3 +y 4 )(2xy+y 3 ). (x 2 y+y 3 x) 2 y (x5 y 3 + e xy2 ) = 3x 5 y 2 + 2xye xy2. x (x2 y+y 3 x) Example 3.4 Evaluate f x (0.25, 1.25) and f y (0.25, 1.25) for the function f(x, y) = cos x sin y + 2. Solution. f x (x, y) = sin x sin y, so f x (0.25, 1.25) = sin(0.25) sin(1.25) 0.23; f y (x, y) = cos x cos y, so f y (0.25, 1.25) = cos(0.25) cos(1.25) 0.31. Example 3.5 Use the result of Example (3.4) to find the equation of the plane tangent to the graph of f(x, y) = cos x sin y at the point ( 0.25, 1.25, f(0.25, 1.25) ) (0.25, 1.25, 0.92). Solution. According to Equation (3) the tangent plane is the linear function L given by z = L(x, y) = f(0.25, 1.25) + f (0.25, 1.25) (x 0.25, y 1.25) 0.92 + ( 0.23, 0.31) (x 0.25, y 1.25) [From Example (3.4)] = 0.92 + ( 0.23)(x 0.25) + (0.31)(y 1.25) = 0.23x + 0.31y + 0.59. Exercise 3.3 Find the equation of the plane tangent to the graph of f(x, y) = x 2 + y 2 + x 2 y at the point (2, 3, f(2, 3)) = (2, 3, 25). Exercise 3.4 Find f x Exercise 3.5 Find f x f and y if f(x, y) = xe x2 y 2. ( ) f and y if f(x, y) = sin x y.

Differentiation Page 10 3.3 Geometric Interpretation Imagine the graph a function f : R 2 R, and slice this graph with the plane y = y 0. Figure 8a illustrates this process for the function f(x, y) = 7 x 2 y 2, and the plane y = 1. The plane intersects the graph in a surface that is traced out by the vector valued function r(t) = t, y 0, f(t, y 0 ), for t [a, b]. Figure 8b illustrates this idea for f(x, y) = 7 x 2 y 2, with y 0 = 1. The dark curved line on the surface of the graph is the path traced by r(t) = t, 1, f(t, 1) for 1.7 t 1.7, so r(t) = t, 1, 6 t 2. In general, f x (x 0, y 0 ) is the derivative with respect to x of the function f(x, y 0 ) evaluated at x = x 0. It is also the derivative of the third component of the vector valued function r(t) = t, y 0, f(t, y 0 ) evaluated at t = x 0. In other words, it is the third component of r (x 0 ) = 1, 0, d dt f(t, y 0) t=x0 = 1, 0, fx (x 0, y 0 ). For our specific example, r (t) = 1, 0, 2t, so r (x 0 ) = 1, 0, 2x 0. Now, f x (x, y) = 2x, so if we set (x 0, y 0 ) = ( 1 ( 2, 1), we get f x (x 0, y 0 ) = f 1 x 2, 1) = 1. This value is identical to the third component of r (x 0 ) = r ( 1 2 ) = 1, 0, 1. (a) Slicing the surface with a plane (b) The resulting curve Figure 8: Geometric interpretation of the partial derivative The geometric interpretation of f x ( 1 2, 1) becomes clear by looking at r ( 1 2) = 1, 0, 1. The latter equation tells us that, at the point ( 1 2, 1), every one-unit change in the x direction and zero-unit change in the y direction produces an instantaneous rate of change in the z direction of 1 unit. Similarly, slicing the graph of f by the plane x = 1, produces a curve r(t) = 1, t, f(1, t), as Figure 9 shows. This time we compute r (y 0 ) = 0, 1, d dt f(x 0, t) t=y0 = 0, 1, fy (x 0, y 0 ). For r(t) = 1, t, 6 t 2, we get r (t) = 0, 1, 2t, and the third component of this vector is the partial of f with respect to y evaluated at y = t. If we set (x 0, y 0 ) = (1, 2 5 ), then f ( ) y 1, 2 5 = 4 5, and r ( 2 ( 5) = 0, 1, 4 5. The interpretation is that, at the point 1, 2 5), every zero-unit change in the x direction and one-unit change in the y direction produces an instantaneous rate of change in the z direction of 4 5 units.

Differentiation Page 11 Figure 9: The partial derivative with respect to y evaluated at (x 0, y 0 ) To see an animation of Figures 8b and 9 visit the following link: http://www.westmont.edu/~howell/courses/ma-019/illustrations/derivs/p-deriv.html. 4 Higher Derivatives and Higher Dimensions Partial derivatives can be evaluated at various points, so we can consider them to be functions. For example, if f(x, y) = sin(xy), then f x (x, y) = y cos(xy), so f x : R 2 R. Thus, partial derivatives themselves may have partial derivatives. What notation should we use to indicate the partial derivative with respect to y of the function f x? A natural choice is f xy because reading left to right gives us the partials in the order that they occur. Taking the partial derivative with respect to x of the function f y would be expressed as f yx. The game doesn t stop here. We can compute the partial derivative of the function f xy with respect to x, and so forth. If we write z = f(x, y), we can express these higher derivatives notationally in a variety of ways: f xx (x, y) = 2 z x 2 = 2 f x 2 = D xxf = f (2,0) (x, y); f xy (x, y) = f xyy (x, y) = f xyx (x, y) = 2 z y x = 2 f y x = D xyf = f (1,1) (x, y); 3 z y 2 x = 3 z x y x = 3 f y 2 x = D xyyf = f (1,2) (x, y); 3 f x y x = D xyxf = f (2,1) (x, y). Notice that the higher partial f xy reads left to right, but the symbol 2 f y x reads right to left. The reason for this apparent anomaly is that the expression 2 f y x is really shorthand for y ( f x ), and is read as either, The partial with respect to y of the partial of f with respect to x, or, The second partial of f, first with respect to x, then with respect to y. Notice, also, that there is some ambiguity in the superscript notation. For example, does f (1,1) (x, y) intend to denote f xy or f yx? The two expressions are not always equal, but thanks to the French mathematician Alexis Claude Clairaut (1713 1765), the continuity of f xy and f yx ensures their equality. The following theorem states this fact more precisely.

Differentiation Page 12 Theorem 1 (Clairaut s Theorem) Suppose f xy and f yx are continuous at all points located in a disk centered at the point (a, b). Then f xy (a, b) = f yx (a, b). Exercise 4.1 Verify Clairaut s theorem for f(x, y) = x 5 y 4 + sin(x 2 y) by evaluating the partials. What about functions f : R n R? Happily, everything we ve said up to this point carries over in a very straightforward way. For example, if w = f(x, y, z) = x 5 + x 3 y 4 z 3 + yz 10, then w y = 4x3 y 3 z 3 + z 10. In general, given x 0 R n, and f : R n R, it can be shown that f( x) L( x) = f( x 0 ) + f ( x 0 ) ( x x 0 ), where f ( x 0 ) = f x1 ( x 0 ), f x2 ( x 0 ),..., f xn ( x 0 ), and f xk ( x 0 ) designates the partial derivative with respect to the k th coordinate evaluated at x 0. Example 4.1 Find f (x, y, z) if f(x, y, z) = x y 3 +z 4. Solution. f (x, y, z) = f x, f y, f z = 1, 3xy2,. 4xz3 y 3 +z 4 (y 3 +z 4 ) 2 (y 3 +z 4 ) 2 Remark. Many texts use the symbol f( x) to designate the vector of partial derivatives, and f is called the gradient of f. Thus, in the last example we would write 1 f(x, y, z) = y 3 + z 4, 3xy2 (y 3 + z 4 ) 2, 4xz3 (y 3 + z 4 ) 2. It is important, however, not to equate f( x) with f ( x). Certainly, if a function f has a derivative at the point x 0, then all its partial derivatives exist at x 0. However, the existence of all the partials at x 0 is not sufficient to guarantee that the function f is differentiable. Can you describe what such a situation would look like geometrically? Figure 10 illustrates a function f(x, y) that is not differentiable at (0, 0) even though its partial derivatives exist at that point. Its graph is shaped like a smooth spherical hill, except there is a cataclysmic drop to a flat plateau right along the x- and y-axes in the direction of the first quadrant. (a) The function itself (b) The function and tangent plane at (0, 0, 1000) Figure 10: A non-differentiable function at (0, 0) whose partials exist there If you were walking on this hill along the ridge {( x, 0, f(x, 0) ) : x > 0 }, all would be fine as long as your y-coordinate remained at 0. If, however, you took a tiny step in the positive y-direction, well, on the way down shout out to your friends, Call 911! The ridges defined by {( x, 0, f(x, 0) ) : x 0 } and {( 0, y, f(0, y) ) : y 0 } are nice and smooth, so f x (0, 0) and f y (0, 0) clearly exist. But f is not differentiable at (0, 0) because there is

Differentiation Page 13 no good linear (i.e., tangent plane) approximation to f at that point. To see why, assume that the height of the graph of f at (0, 0) is 1000 feet. The only possible tangent plane at the point (0, 0, 1000) is clearly parallel to the xy plane, so it has the equation z = 1000. As Figure 10b illustrates, this plane is a good approximation to the function z = f(x, y) for points (x, y) near (0, 0) unless (x, y) is in the first quadrant. In that case, the tangent plane, whose z-values equal 1000, is a miserable approximation to the function z = f(x, y), whose z-values equal the height of the plateau. Exercise 4.2 What do f x (0, 0) and f y (0, 0) equal for the function illustrated in Figure 10a? Exercise 4.3 The function in Figure 10a is discontinuous at (0, 0). Explain why and then identify any other points at which the function is discontinuous. Finally, sketch a function that is continuous at (0, 0), yet whose partial derivatives do not exist at that point. Explain. The following theorem tells us when we can be sure that a function is differentiable. Theorem 2 Suppose f : R n R, and all the partial derivatives exist and are continuous at x 0. Then f is differentiable at x 0. 5 Directional Derivatives Let s take stock of what we ve learned so far. A function f : R n R is differentiable at x 0 R n if and only if there exists a linear function L : R n R that is a good approximation to the function f for points x near x 0. By saying that L is a good approximation to f we mean that x x 0 e( x) x x 0 = 0, where e( x) = f( x) L( x). In other words, the error term (that measures the difference between f and L) approaches zero much faster than does the distance between x and x 0. The linear function L has the form L( x) = f( x 0 ) + m ( x x 0 ), and the vector m has the partial derivatives of f as its components. We define the derivative of f at the point x 0, therefore, to be that vector. Thus, m = f ( x 0 ) = f x1 ( x 0 ), f x2 ( x 0 ),..., f xn ( x 0 ). Thus, if a function f : R n R is differentiable at x 0 we know that f( x) L( x) = f( x 0 ) + f ( x 0 ) ( x x 0 ), where f ( x 0 ) = f x1 ( x 0 ), f x2 ( x 0,..., fxn ( x 0 ). e( x) x x 0 For functions of two variables Equation (5) showed that the expression = 0 is the [ ] x x 0 f(x,y) f(x same as 0,y 0 ) a()+b(y y 0 ) = 0, where f ( x 0 ) = a, b. The partial (x,y) (x 0,y 0 ) (x x0 ) 2 +(y y 0 ) 2 (x x0 ) 2 +(y y 0 ) 2 derivatives we looked at in the last section grew out of evaluating these its as (x, y) approached (x 0, y 0 ) along straight lines parallel to the x- and y-axes. Of course, there are other straight line directions that we can look at. What would happen in R 2 if x = (x, y) were to approach x 0 = (x 0, y 0 ) along a direction of some arbitrary unit vector, say u = (u 1, u 2 )? In this case, we could write x as x 0 + h u = (x 0, y 0 ) + h(u 1, u 2 ) = (x 0 + hu 1, y 0 + hu 2 ), and Equation (5) would then become h 0 [ f(x 0 + hu 1, y 0 + hu 2 ) f(x 0, y 0 ) (hu1 ) 2 + (hu 2 ) 2 a(hu 1) + b(hu 2 ) (hu1 ) 2 + (hu 2 ) 2 ] = 0, or h 0 [ f( x 0 + h u) f( x 0 ) (hu1 ) 2 + (hu 2 ) 2 a(hu 1) + b(hu 2 ) (hu1 ) 2 + (hu 2 ) 2 ] = 0, (11)

Differentiation Page 14 where as before a = f x (x 0, y 0 ) and b = f y (x 0, y 0 ). Letting h approach zero from the right and then the left will reveal that Equation (11) implies f( x 0 + h u) f( x 0 ) = a, b u. (12) h 0 h Exercise 5.1 Show that Equation (12) is a consequence of Equation (11) by analyzing what happens when h approaches zero from the right and then from the left. Partial derivatives are merely special cases of directional derivatives. For partial derivatives the direction of approach is either along the x-axis or y-axis. As with partial derivatives there are a variety of notations for directional derivatives: f( x 0 + h u) f( x 0 ) = f u ( x 0 ) = f h 0 h u = D uf. Our findings apply to functions of more than two variables as well. Below we formalize all of the above with a definition and theorem that apply to functions f : R n R. Definition 5.1 If f : R n R, the quantity f u ( x 0 ) = f( x 0 +h u) f( x 0 ) h 0 h is called the directional derivative of f with respect to the unit vector u evaluated at x 0, provided the it exists. Theorem 3 If f : R n R is differentiable at x 0, then the directional derivative of f exists in the direction of any unit vector u, and f u ( x 0 ) = f x1 ( x 0 ), f x2 ( x 0 ),..., f xn ( x 0 ) u. Figure 11 shows: (1) f (x, y) = x 4 3 +y 2 +2; (2) the point x 0 = (x 0, y 0 ) = ( 1 2, 5) 1 ; (3) the tangent plane (linear function) approximation to f at the point ( x 0, y 0, f(x 0, y 0 ) ) ( 1 2, 1 5, 2.4) ; (4) the unit vector u = 3 5, 5 4 drawn from x0 ; (5) f( x 0 ) 1.06, 0.4. (Recall that f, also called the gradient, is the vector of partial derivatives.) The displacement on the tangent plane directly above the points x 0 and x 0 + u is illustrated by the dark slanted line. The slope of this line is f u ( x 0 ). Since u is a unit vector, the slope is the distance represented by the dark vertical line directly above the point x 0 + u. In this case we compute f u ( x 0 ) 1.06, 0.4 3 5, 5 4 = 0.956. The value 0.956 indicates that, at the instant when x 0 = ( 1 2, 5) 1, every unit change in the direction of u = 3 5, 4 5 produces 0.956 units of change in the z-value. Alternatively, we could say that, at the instant when x 0 = ( 1 2, 1 ) 5, every 3 5 units of change in x and 4 5 units of change in y produces 0.956 units of change in the z-value. Figure 11: A directional derivative The circular segment drawn at the top of the tangent plane maps out where different displacement vectors above x 0 + u would appear for unit vectors u = (cos θ, sin θ), where 0 θ π 2. To see an animation of Figure 11 visit the following link: http://www.westmont.edu/~howell/courses/ma-019/illustrations/derivs/d-deriv.html. The maximum value of f u ( x 0 ) seems to occur when u happens to be in the same direction as f( x 0 ). This phenomenon is always the case, and explains why f is called the gradient of f. Theorem 4 formalizes this result.

Differentiation Page 15 Theorem 4 Suppose f : R n R is differentiable at x 0. The maximum value for f u ( x 0 ) occurs when u is in the same direction as f( x 0 ) = f x ( x 0 ), f y ( x 0 ), and the minimum value for f u ( x 0 ) occurs when u is in the same direction as f( x 0 ). Proof. f u ( x 0 ) = f( x 0 ) u = f( x 0 ) u cos θ = f( x 0 ) cos θ, where θ is the angle between f( x 0 ) and u. Now f( x 0 ) is constant, so the product f( x 0 ) cos θ is maximized when θ = 0, and minimized when θ = π. Exercise 5.2 Rhett has so far been unable to impress anyone with his macho feats, so he has come up with a new scheme to show off his manhood. He gets everyone s attention and then jumps barefoot onto a flat bed of burning coals, landing at the point (1, 2), measured in feet. The temperature (degrees Fahrenheit) at any point (a, b) on the bed of coals equals 200 a 3 b a 4 b 2, so he quickly senses the folly of his actions. In what direction should he move so as to have the temperature decrease most rapidly? At what rate will the temperature change if Rhett leaves the point (1, 2) and heads directly towards the point (3, 5)? Make sure your answer contains appropriate units of measurement. Exercise 5.3 Consider the function shown in Figure 10, and unit vectors u = cos θ, sin θ. For what values of θ does f u (0, 0) exist, and what does it equal? Explain your reasoning. 6 The Multivariable Chain Rule 6.1 Introduction For functions of one variable, f (c) is the slope of the line that is tangent to the graph of the function f at the point ( c, f(c) ). The instantaneous rate of change perspective of the derivative focuses on how fast the function f is changing at a particular point. For example, if f (c) = 4, we can say that, at the instant when x = c, the y value is changing four times as fast as the x value. If the functionf were to continue to behave as it is at this particular instant, every unit change in x from the point x = c would produce four units of change in the function f. The chain rule tells us how to differentiate a function h that is defined as a chain of two functions, that is, h(x) = f ( g(x) ) = (f g)(x). Figure 12 illustrates this concept. R R R g f c g(c) h(c) = f ( g(c) ) h(x) = f ( g(x) ) = (f g)(x) Figure 12: The single variable chain rule Suppose that g (c) = 2, andf ( g(c) ) = 3. What could we conclude about the value of h (c)? Using the instantaneous rate of change perspective of the derivative, we interpret g (c) = 2 to mean that every unit change in x from the point x = c produces two units of change in the function g. Also, f ( g(c) ) = 3 means that every unit change in x from the point x = g(c) produces three units of change in the function f. It stands to reason, then, that every unit change in x from the point x = c produces g (c)f (g(c)) = (2)(3) = 6 units change in the function h. Thus, h (c) = g (c)f ( g(c) ), or h (c) = f ( g(c) ) g (c).

Differentiation Page 16 Exercise 6.1 For h(x) = ( 4x 3 + 2x + 7 ) 2 = f(g(x)), identify the following: a. f(x); b. g(x); c. f (13); d. g(1); e. g (1); f. h (1). Exercise 6.2 If f(x) = x 3 and g(x) = 3x + 4, evaluate f ( g(x) ) and g ( f(x) ). 6.2 Higher Dimensions The multivariable chain rule is remarkably similar to its single-variable counterpart. Consider the function h(t) = f ( g(t) ), where g : R R 2, and f : R 2 R, as shown in Figure 13. The middle circle illustrates the output of the function g in red, with the point g(c) represented by a blue dot. The blue vector is g (c). It is depicted as being 2 units long by hash marks, so that g (c) = 2. R R 2 R g f y g(t) g(c) c x h(c) = f ( g(c) ) h(t) = f ( g(t) ) = (f g)(t) Figure 13: The multivariable chain rule Let u be the unit vector in the direction of g (c). That is, u = g g (c). The first two circles in Figure 13 illustrate that every unit change from the point c produces g (c) = 2 units of (instantaneous) change in the function g in the direction of u. As the second and third circles in the figure illustrate, every unit change from the point g(c) in the direction of u produces three units change ( in the ) function f. This (instantaneous) change is computed by the directional derivative, i.e., f u g(c) = 3. Obviously, then, every unit change from the point c produces g ( ) (c) f u g(c) = (2)(3) = 6 units of (instantaneous) change in the function h. That is, Now, according to Theorem 3, Combining Equations (13) and (14) gives h (c) = g (c) f u ( g(c) ). (13) f u ( g(c)) = f ( g(c) ) u = f ( g(c) ) g (c) g (c). (14) h (c) = g (c) f u ( g(c) ) = g (c) f ( g(c) ) g (c) g (c) = f ( g(c) ) g (c). The following theorem summarizes these observations. Theorem 5 Suppose g : R R n is differentiable at c, and f : R n R is differentiable at g(c). Then h(t) = f ( g(t) ) is differentiable at c, and h (c) = f ( g(c) ) g (c). On the following page we trace through this logic again, but with a concrete example.

Differentiation Page 17 We begin with a chain of functions: h(t) = f ( g(t) ), where g(t) = 2.5t 3 + 7.2t 2 5.9t + 1.7, 4 3 t 17, f(x, y) = x 4 3 + y 2 + 2, and c = 1. 15 With c = 1 we compute g(c) = 0.5, 0.2, g (c) = 1, 4 3, and g (c) = 5 3. The unit vector in the direction of g (c) is u = g (c) g (c) = 3 5, 4 5. Now, every unit change in t from the point t = c produces g (c) = 5 3 units change in the function g in the direction of g (c). But every unit change from the point g(c) in the direction of g ( ) (c) produces f u g(c) units change in the function f, which equals f u ( g(c) ) = f ( g(c) ) u = f ( g(c) ) g (c) g (c). Hence, the total instantaneous change in the function h (per every unit change from the point t = c) is [ h (c) = f ( g(c) ) g ] (c) g g (c) = f (c) ( g(c) ) g (c), which is what the multivariable chain rule stipulates. For our given functions, we compute f (x, y) = 4 3 x 1 3, 2y, so f ( g(c) ) = f ( g(1) ) = f (0.5, 0.2) 1.06, 0.4. Figure 14: The chain rule Combining all this information gives a numerical value for the derivative: h (c) = h (1) = f ( g(1) ) g (1) 1.06, 0.4 1, 4 1.59. 3 Thus, every unit change from the point c = 1 on the t-axis produces a change of 1.59 units on the z-axis. In other words, the instantaneous rate of change in the function h from the point h(1) = f ( g(1) ) amounts to 1.59 units for every unit change from the point c = 1 on the t-axis. The length of the vertical yellow line directly above the vector g(c) + g (c) in Figure 14, then, must equal 1.59 units. An enlarged web illustration of these ideas can be found at the following link: http://www.westmont.edu/~howell/courses/ma-019/illustrations/derivs/chain-rule.pdf. 6.3 Leibniz Notation The Leibniz notation for h (t) is similar to that for single variable calculus. We let z represent the output of the function f and also consider z as a function of the variable t. We then have f ( g(t) ) = f (x, y) = f x, f y = z x, z y and g (t) = dx dt, dy dt, so that dz z dt = x, z dx y dt, dy = z dx dt x dt + z dy y dt. Most texts write the above formula without the dot product, so that it reads as follows: dz dt = z dx x dt + z dy y dt. Great care must be taken in using this approach, however, as there are weaknesses in Leibniz notation: the functions chained together are obscured, it is not clear at what points the derivatives

Differentiation Page 18 are evaluated, and the z-variable must play a dual role ( as a direct output of both h(t) and f(x, y) ). For the functions of Figure 14 we have x, y = g(t) = 2.5t 3 + 7.2t 2 5.9t + 1.7, 4 3 t 17 15 and z = f(x, y) = x 4 3 + y 2 + 2. Thus, dx dt = 7.5t 2 + 14.4t 5.9, dy dt = 4 3, z x = 4 3 x 1 3, and z y = 2y. Combining this information gives dz dt = ( 4 3 x 1 3 )( 7.5t 2 + 14.4t 5.9) + (2y)( 4 3 ), which we evaluate at t = 1, and (x, y) = (0.5, 0.2). Do you see why we evaluate the resulting expression at these points? An advantage of the Leibniz notation is that it more easily reveals how to piece the various functions of a chain together. Suppose that w is the output of a function of three variables, say w = f(x, y, z), and that the vector x, y, z is the output of a function g(s, t) : R 2 R 3, so that g(s, t) = x, y, z. Then w also depends on s and t because w = f ( g(s, t) ), and w s = w x x s + w y y s + w z z s, w t = w x x t + w y y t + w z z t.