REVIEW OF DIFFERENTIAL CALCULUS DONU ARAPURA 1. Limits and continuity To simplify the statements, we will often stick to two variables, but everything holds with any number of variables. Let f(x, y) be a real valued function defined on open subset of R 2. The it f(x, y) = L (x,y) (a,b) means that f(x, y) is approximately L whenever (x, y) is close to (a, b). The precise meaning is as follows. If we specified ɛ > 0 (say ɛ = 0.0005), then we could pick a tolerance δ > 0 which would guarantee that f(x, y) L < ɛ (i.e. f(x, y) agrees with L up to the first 3 digits for ɛ = 0.0005) whenever the distance between (x, y) and (a, b) is less than δ. Here are a few examples. Example 1.1. Consider (x,y) (0,0) x 2 x 2 + y 2 Along the x-axis (y = 0), the fraction is identically 1. Which means that the it would have to be 1 if it existed. However, along the y-axis (x = 0), the fraction is identically 0. So the it would have to be 0 if it existed. The two statements are contradictory, so we must conclude that it does not exist. Example 1.2. Consider (x,y) (0,0) x 3 x 2 + y 2 The same sort of analysis, would show that the its along the x and y axes are both zero. This leads us to suspect the answer is 0, but it is not enough to draw a definite conclusion. Instead, we the definition. Given ɛ > 0, we have to find δ > 0 so that x 3 x 2 + y 2 < ɛ when (x, y) < δ. If we rewrite this using polar coordinates, we get x 3 x 2 + y 2 = r 3 cos 3 θ r 2 = r cos3 θ r 3 while (x, y) = r. Observing that r 3 < r when r < 1, shows that we can take δ = ɛ if ɛ < 1 and (say) δ = 1 2 otherwise. Date: February 14, 2016. 1
2 DONU ARAPURA A function f(x, y) is continuous at (a, b) if f(x, y) (x,y) (a,b) exists and equals f(a, b). It is continuous if it is so at each point of its domain. Most of the basic of functions are continuous. Theorem 1.3. The following are functions of one or two variables are continuous (1) f(x, y) = x + y, (x, y) R 2 (2) f(x, y) = xy, (x, y) R 2 (3) f(x, y) = x y, (x, y) R2, y 0 (4) exp(x) = e x, x R (5) sin x, x R (6) cos x, x R (7) log x, x > 0 (we always use log = log e = ln). The next theorem will allow us to build more complicated continous functiobns from continuous parts. Theorem 1.4. If f(y 1,, y m ), g 1 (x 1,, x n ), are continuous. Then so is Example 1.5. Check that f(g 1 (x 1,, x n ),, g m (x 1,, x n )) f(x, y) = { sin(x 2 +y 2 ) x 2 +y 2 if (x, y) (0, 0) 1 if (x, y) = (0, 0) is continuous on R 2. For the solution, we introduce the auxillary function { sin(t) g(t) = t if t 0 1 if t = 0 This is clearly continous for t 0 by the above theorems. It is also continuous at t = 0, because we have t 0 g(t) = 1 by L Hopital s rule. Therefore f(x, y) = g(x 2 + y 2 ) is continuous everywhere. 2. Differentiability Given a function f(x, y), the partial derivatives x (x f(x 0 + h, y 0 ) f(x 0, y 0 ) 0, y 0 ) = f x (x 0, y 0 ) = h 0 h y (x f(x 0, y 0 + h) f(x 0, y 0 ) 0, y 0 ) = f y (x 0, y 0 ) = h 0 h These are slopes of the curves obtained by slicing z = f(x, y) parallel to the x and y axes. We say that f is differentiable, if near any point p = (x 0, y 0, f(x 0, y 0 )) the graph z = f(x, y) can approximated by a plane passing through p. In other words, there exists quantities A, B such that we may write with f(x, y) = f(x 0, y 0 ) + A(x x 0 ) + B(y y 0 ) + remainder (x,y) (x 0,y 0) remainder (x, y) (x 0, y 0 ) = 0
REVIEW OF DIFFERENTIAL CALCULUS 3 An equivalent way to express the last condition is that for every ɛ > 0 remainder < ɛ (x, y) (x 0, y 0 ) when the distance (x, y) (x 0, y 0 ) is small enough. The idea is that as the distance (x, y) (x 0, y 0 ) goes to zero, the remainder goes to zero at an even faster rate. We can see that the coefficients are nothing but the partial derivatives A = x (x 0, y 0 ), B = y (x 0, y 0 ) There is a stronger condition which is generally easier to check. f is called continuously differentiable or C 1 if it and its partial derivatives exist and are continuous. Theorem 2.1. A C 1 function is differentiable. Example 2.2. Check that f(x, y) = e x + cos xy is C 1. An easy calculation show that f x = e x y sin xy and f y = x sin xy. These functions are easily seen to be continous using the previous criteria. Example 2.3. Let f(x, y) = The partials are xy x2 +y 2 with f(x, y) = 0 is continuous (check this). x = y 3 (x 2 + y 2 ) 3/2 y = x 3 (x 2 + y 2 ) 3/2 But these are not continuous at the origin (check). So f is not C 1. More work shows that it is not differentiable either. 3. Chain rule In one variable, if y = f(x), x = g(t) where f and g are differentiable, then the chain rule says dy dt = dy dx dx dt Rewriting this in functional notation (f g) (t) = f (g(t))g (t) should convince you that there is more to it than just canceling dx. Let us start with the simplest analogue for several variables. Theorem 3.1. Suppose that z = f(x, y), x = g(t), y = h(t), where the functions are differentiable. Then dz dt = z dx x dt + z dz y dt Proof. In outline, g(t + t) = g(t) + g (t) t + R 1 h(t + t) = h(t) + h (t) t + R 2 where the remainders R i / t 0 as 0. Using differentiability of f, we can see that f(g(t+ t), h(t+ t)) = f(g(t), h(t))+f x (g(t), h(t))g (t) t+f y (g(t), h(t))h (t) t+r 3
4 DONU ARAPURA where R 3 / t 0. This implies f(g(t + t), h(t + t)) f(g(t), h(t)) t = f x(g(t), h(t))g (t) t + f y (g(t), h(t))h (t) t + R 3 t f x (g(t), h(t))g (t) + f y (g(t), h(t))h (t) To state the general form, let F : R n R m be a vector valued function. This can be written out as F (x 1, x n ) = (f 1 (x 1,, x n ), f 2 (x 1,, x n ), ) where the f i are scalar valued functions. The matrix or Jacobian derivative 1 1 x 1 x 2 DF = 2 2 x 1 x 2 This is an m n matrix valued function. We say that F is differentiable if F (v + h) = F (v) + DF (v)h + remainder where, as before, the remainder goes to 0 as h 0 at the same rate or faster. Also to be clear v, h R n are vectors. The expression DF (v)h means, first evaluate the function DF at v, and then multiply the resulting matrix with h. Theorem 3.2 (Chain rule). Given differentiable functions F : R n R m and G : R m R p, the function G F (v) = G(F (v)) is also differentiable. Its derivative and D(G F )(v) = DG(F (v))df (v) To see what this means more concretely, write where and Then the chain rule says that ( ) y1 y 1 u 1 u 2 y 2 = u 1 x i = f i (u 1,, u n ) y i = g i (x 1,, x m ) F (u 1,, u n ) = (f 1 (u 1,, u n ), ) G(x 1,, x m ) = (g 1 (x 1,, x m ), ) ) ( ) x 2 x1 x 1 u 1 u 2 x x 1 2 u 1 ( y1 y 1 x 1 y 2 This can be written out as a series of equations y i u j = y i x 1 x 1 u j + y i x 2 x 2 u j + y i x n x n u j
REVIEW OF DIFFERENTIAL CALCULUS 5 4. gradient Given f(x 1, x 2, ), the matrix derivative is usually called the gradient f = Df = (,, ) x 1 x 2 To understand what this means, let us work in the plane. Given a unit vector v = (v 1, v 2 ) R 2, the directional derivative of f along v at (x 0, y 0 ) is f(x + v 1 h, y + v 2 h) f(x, y) h 0 h This is the slope of the curve obtained by cutting the surface z = f(x, y) by a line parallel to v. Theorem 4.1. If f is differentiable, then the direction derivative of f along v is f v = x v 1 + y v 2 Proof. The directional derivative can be rewritten as dz dt, where x = x 0 + v 1 t, y = y 0 + v 2 t. Therefore chain rules shows that dz dt = dx x dt + dy y dt = x v 1 + y v 2 Given a unit vector v making angle θ with f, the directional derivative f v = f cos θ is maximized when θ = 0. We can summarize this as Corollary 4.2. The gradient f points in the direction where f increases the fastest, and f points in the direction where f decreases the fastest. Theorem 4.3. The gradient f(x 0, y 0, z) gives a normal to the surface f(x, y, z) = c at (x 0, y 0, z 0 ). Proof. We have to show that f v = 0 for any tangent vector v to the surface at (x 0, y 0, z 0 ). Such a vector is the tangent vector to some curve C lying on the surface. We can parameterize it by x = g(t), y = h(t), z = k(t) with (x 0, y 0, z 0 ) corresponding to t = 0. Since C lies on the surface f(g(t), h(t), k(t)) = c Differentiating both sides and using the chain rule gives dx x dt + dy y dt + g dz t dt = 0 We can write this as f v = 0 which is what we wanted to prove. 5. Second order derivatives Given z = f(x, y), the second order partial derivatives are defined by x 2 = 2 z x 2 = f xx = z xx = ( ) x x x y = 2 z x y = f yx = z yx = x ( y )
6 DONU ARAPURA y x = 2 z x y = f xy = z xy = ( ) y x ( ) y y 2 = 2 f y 2 = f yy = z yy = y A similar story holds for more than two variables. A function f(x 1, x 2, ) is twice continuously differentiable or C 2 if its partial derivatives up to second order exist and are continuous. Theorem 5.1. If f(x 1, x 2, ) is C 2 then the mixed partials are equal x j x i = 2 f x i x j Proof. We just outline the proof for f(x, y). We have = x 0 = x y = f y (x + x, y) f y (x, y) x 0 x [ 1 f(x + x, y + y) f(x + x, y) x y 0 y y 0 x 0 y 0 By a similar argument y 0 x 0 ] f(x, y + y) f(x, y) y 1 [f(x + x, y + y) f(x + x, y) f(x, y + y) + f(x, y)] x y y x = 1 [f(x + x, y + y) f(x + x, y) f(x, y + y) + f(x, y)] x y If we could interchange the it symbols we would be done. As a general rule, this is not always possible. However, in our case, using the C 2 condition it can be shown that both expressions coincide with ( x, y) (0,0) 1 x y [f(x + x, y + y) f(x + x, y) f(x, y + y) + f(x, y)] If f(x, y) is C 2, then we have a Taylor expansion [ ] [ ] f(x, y) = f(a, b) + (a, b) (x a) + (a, b) (y b) + 1 x y 2 [ 2 ] f + (a, b) (x a)(y b) + 1 y x 2 [ 2 ] f (a, b) (y b) 2 + R y2 [ 2 ] f (a, b) (x a) 2 x2 To be useful, we need to know that remainder R is small when (x, y) is close to (a, b). Here is the precise statement. Theorem 5.2. The remainder R satisfies (x,y) (a,b) R (x, y) (a, b) 2 0
REVIEW OF DIFFERENTIAL CALCULUS 7 Proof. Let g(t) = f(a + t(x a), b + t(y b)). This is a C 2 function. So by the usual Taylor formula for one variable, (1) g(t) = g(0) + g (0)t + 1 2 g (0)t 2 + R where R/t 2 0 as t 0. By the chain rule g (t) = f x (a + t(x a), b + t(y b))(x a) + f y (a + t(x a), b + t(x b))(y b) g (t) = f xx (a + t(x a), b + t(y b))(x a) 2 + Substituting into (1) and setting t = 1, gives the two variable Taylor expansion where the remainder behaves as stated. 6. Maxima-Minima Partial derivatives can be used to determine maxima and minima. Given a C 1 function on R n, a critical point is where the gradient is zero. We have the following basic fact: Theorem 6.1. If f is C 1 function on R n with a local maximum or minimum at a point, then this is a critical point. Proof. Let us assume that n = 2 and call the point (a, b). Let z = f(x, y) be the graph. By calculus of one variable the slope of the curves z = f(x, b) and z = f(a, y) are zero (a, b). This means that partial derivatives are zero at (a, b). Example 6.2. Let f(x, y) = sin x sin y. Setting f x = cos x sin y = 0 and f y = sin x cos y = 0 gives infinitely many solutions (0, 0), (π/2, π/2), To proceed further, we need the second derivative test. Now suppose that (a, b) = (0, 0) is a critical point. Then Taylor s formula reduces to f(x, y) = f(0, 0) + 1 [ fxx (0, 0)x 2 + 2f xy (0, 0)xy + f yy (0, 0)y 2] + R 2 where the remainder R is small in the sense of theorem 5.2. Let us rewrite the quadratic part (in square brackets) as Q(x, y) = Ax 2 + 2Bxy + Cy 2 We want to assume that this quadratic function is nondegenerate in the sense that it Q(x, y) 0 unless (x, y) = (0, 0). Then it will dominate the remainder, so it controls how f(x, y) behaves close to (0, 0). Suppose for now that A 0. Then we can complete the square Ax 2 + 2Bxy + Cy 2 = A(x + B A y)2 + 1 A (AC B2 )y 2 Notice that AC B 2 cannot be zero because of our nondegeneracy condition. It follows that if A > 0 and the discriminant AC B 2 > 0, then we always have Q(x, y) > 0, if (x, y) (0, 0) Therefore we have local minimum at (0, 0). If A < 0 and AC B 2 > 0, then Q(x, y) < 0, if (x, y) (0, 0)
8 DONU ARAPURA Therefore we have a local maximum at (0, 0). Finally, suppose that AC B < 0, then signs of the coefficients of A(x + B A y)2 + 1 A (AC B2 )y 2 are opposite. This means that we have a saddle point at (0, 0). With a bit more work, we can see that this conclusion holds even if A = 0. To summarize: Theorem 6.3. If (a, b) is the critical point of a C 2 function f(x, y). Then (1) It is a local minimum of f xx (a, b) > 0 and f xx (0, 0)f yy (0, 0) f xy (0, 0) 2 > 0. (2) It is a local minimum of f xx (a, b) < 0 and f xx (0, 0)f yy (0, 0) f xy (0, 0) 2 > 0. (3) It is a saddle point if f xx (0, 0)f yy (0, 0) f xy (0, 0) 2 < 0. (4) If f xx (0, 0)f yy (0, 0) f xy (0, 0) 2 = 0, Q is degenerate, so the test gives no information. Example 6.4. To finish example 6.2. Let f(x, y) = sin x sin y, f xx = sin x sin y, f xy = cos x cos y, f yy = sin x sin y. At (0, 0), the discriminant f xx (0, 0)f yy (0, 0) f xy (0, 0) 2 = 1, so this is a saddle point. At (π/2, π/2), f xx (π/2, π/2) = 1 and the discriminant is 1, so this gives a local maximum. To finish the story, we have to explain how the second derivative test works in more than two variables. If f(x 1,, x n ) is C 2, we have a Taylor expansion n f(x 1,, x n ) = f(0,, 0) + (0,, 0)x i x i + 1 2 n n i=1 j=1 i=1 x i x j (0,, 0)x i x j + R where we are expanding around the origin for simplicity. The remainder R goes to zero fast as (x 1,, x n ) (0,, 0). Let us write h ij = Q(x 1,, x n ) = 1 2 2 f x i x j (0,, 0) n i=1 j=1 n h ij x i x j Then the h ij forms an n n matrix H, called the Hessian; more precisely it is the Hessian evaluated at (0,, 0). We know that h ij = h ji by equality of mixed partials. Therefore H is a symmetric matrix. The last condition can be expressed by saying that H is equal to its transpose H T. The function Q, or matrix H, is called nondegenerate if Q(x 1,, x n ) 0 when (x 1,, x n ) is nonzero, positive definite if Q(x 1,, x n ) > 0 when (x 1,, x n ) is nonzero, negative definite if Q(x 1,, x n ) < 0 when (x 1,, x n ) is nonzero, Now suppose (0,, 0) is a critical point, then Taylor s formula becomes f(x 1,, x n ) = f(0,, 0) + 1 2 Q(x 1,, x n ) + R When Q is nondegenerate, it controls the behaviour near the critical point. Here is the general second derivative test. Theorem 6.5. If H is the Hessian evaluated at critical point of f, then
REVIEW OF DIFFERENTIAL CALCULUS 9 (1) if H is positive definite, the point is local minimum, (2) if H is negative definite, the point is local minimum, (3) in all other cases, either H is nondegenerate and the point is a saddle point, or else H is degenerate and the test is inconclusive. In order to turn this into a useful test, we need a method for checking when a matrix H is positive or negative definite. If H is a diagonal matrix then there is an obvious criterion: it is positive definite exactly when the diagonal entries are all positive. To extend this to general matrices, we recall a definition from linear algebra. A real or complex number λ is called an eigenvalue of H if there is a nonzero n 1 column vector v, called an eigenvector, such that Hv = λv There is a standard method for finding the eigenvalues for a matrix which can be found in any book on linear algebra. Or else on can do this numerically using a computer. The following result is a consequence of the spectral theorem in linear algebra. Theorem 6.6. If H is a real symmetric matrix, then all its eigenvalues are real. H is positive definite (respectively negative definite) if and only all its eigenvalues are positive (respectively negative). This leads to a practical test to find maxima and minima when combined with previous theorem.