REVIEW OF DIFFERENTIAL CALCULUS

Similar documents
Calculus 2502A - Advanced Calculus I Fall : Local minima and maxima

Contents. 2 Partial Derivatives. 2.1 Limits and Continuity. Calculus III (part 2): Partial Derivatives (by Evan Dummit, 2017, v. 2.

Definition 3 (Continuity). A function f is continuous at c if lim x c f(x) = f(c).

MA102: Multivariable Calculus

Chapter 2: Unconstrained Extrema

OR MSc Maths Revision Course

The Derivative. Appendix B. B.1 The Derivative of f. Mappings from IR to IR

3.5 Quadratic Approximation and Convexity/Concavity

3 Applications of partial differentiation

PRACTICE PROBLEMS FOR MIDTERM I

4 Partial Differentiation

MTH4101 CALCULUS II REVISION NOTES. 1. COMPLEX NUMBERS (Thomas Appendix 7 + lecture notes) ax 2 + bx + c = 0. x = b ± b 2 4ac 2a. i = 1.

HW3 - Due 02/06. Each answer must be mathematically justified. Don t forget your name. 1 2, A = 2 2

Systems of Linear ODEs

Math 5BI: Problem Set 6 Gradient dynamical systems

Taylor Series and stationary points

Study Guide/Practice Exam 3

Solutions to Dynamical Systems 2010 exam. Each question is worth 25 marks.

SIMPLE MULTIVARIATE OPTIMIZATION

Lecture 6 Positive Definite Matrices

(x + 3)(x 1) lim(x + 3) = 4. lim. (x 2)( x ) = (x 2)(x + 2) x + 2 x = 4. dt (t2 + 1) = 1 2 (t2 + 1) 1 t. f(x) = lim 3x = 6,

MATH Max-min Theory Fall 2016

g(t) = f(x 1 (t),..., x n (t)).

FIRST-ORDER SYSTEMS OF ORDINARY DIFFERENTIAL EQUATIONS III: Autonomous Planar Systems David Levermore Department of Mathematics University of Maryland

11 a 12 a 21 a 11 a 22 a 12 a 21. (C.11) A = The determinant of a product of two matrices is given by AB = A B 1 1 = (C.13) and similarly.

Module Two: Differential Calculus(continued) synopsis of results and problems (student copy)

FIXED POINT ITERATIONS

This exam will be over material covered in class from Monday 14 February through Tuesday 8 March, corresponding to sections in the text.

Chapter 4. Inverse Function Theorem. 4.1 The Inverse Function Theorem

TEST CODE: MMA (Objective type) 2015 SYLLABUS

Functions of Several Variables

Note: Every graph is a level set (why?). But not every level set is a graph. Graphs must pass the vertical line test. (Level sets may or may not.

Basics of Calculus and Algebra

Week 4: Differentiation for Functions of Several Variables

Math 600 Day 1: Review of advanced Calculus

Chapter 7. Extremal Problems. 7.1 Extrema and Local Extrema

Introduction to gradient descent

DEPARTMENT OF MATHEMATICS AND STATISTICS UNIVERSITY OF MASSACHUSETTS. MATH 233 SOME SOLUTIONS TO EXAM 2 Fall 2018

Transpose & Dot Product

Gradient Descent. Dr. Xiaowei Huang

Tangent spaces, normals and extrema

TEST CODE: MIII (Objective type) 2010 SYLLABUS

THE NATIONAL UNIVERSITY OF IRELAND, CORK COLÁISTE NA hollscoile, CORCAIGH UNIVERSITY COLLEGE, CORK. Summer Examination 2009.

Tangent Planes, Linear Approximations and Differentiability

Chapter 11. Taylor Series. Josef Leydold Mathematical Methods WS 2018/19 11 Taylor Series 1 / 27

Optimization and Calculus

The Theory of Second Order Linear Differential Equations 1 Michael C. Sullivan Math Department Southern Illinois University

Math 291-3: Lecture Notes Northwestern University, Spring 2016

We denote the derivative at x by DF (x) = L. With respect to the standard bases of R n and R m, DF (x) is simply the matrix of partial derivatives,

Calculus for the Life Sciences II Assignment 6 solutions. f(x, y) = 3π 3 cos 2x + 2 sin 3y

Transpose & Dot Product

Econ 204 (2015) - Final 08/19/2015

446 CHAP. 8 NUMERICAL OPTIMIZATION. Newton's Search for a Minimum of f(x,y) Newton s Method

An Overly Simplified and Brief Review of Differential Equation Solution Methods. 1. Some Common Exact Solution Methods for Differential Equations

MATH 307: Problem Set #3 Solutions

Chapter 4: Partial differentiation

Review for the Final Exam

Convex Functions and Optimization

Multivariable Calculus

8. Diagonalization.

a x a y = a x+y a x a = y ax y (a x ) r = a rx and log a (xy) = log a (x) + log a (y) log a ( x y ) = log a(x) log a (y) log a (x r ) = r log a (x).

NOTES ON MULTIVARIABLE CALCULUS: DIFFERENTIAL CALCULUS

(x 3)(x + 5) = (x 3)(x 1) = x + 5. sin 2 x e ax bx 1 = 1 2. lim

2tdt 1 y = t2 + C y = which implies C = 1 and the solution is y = 1

SOLUTIONS TO THE FINAL EXAM. December 14, 2010, 9:00am-12:00 (3 hours)

Higher Mathematics Course Notes

M311 Functions of Several Variables. CHAPTER 1. Continuity CHAPTER 2. The Bolzano Weierstrass Theorem and Compact Sets CHAPTER 3.

Chapter 3a Topics in differentiation. Problems in differentiation. Problems in differentiation. LC Abueg: mathematical economics

Chapter 4. Several-variable calculus. 4.1 Derivatives of Functions of Several Variables Functions of Several Variables

Analysis II - few selective results

LINEAR ALGEBRA BOOT CAMP WEEK 4: THE SPECTRAL THEOREM

Maxima and Minima. (a, b) of R if

Engg. Math. I. Unit-I. Differential Calculus

Math 350 Solutions for Final Exam Page 1. Problem 1. (10 points) (a) Compute the line integral. F ds C. z dx + y dy + x dz C

Derivatives in 2D. Outline. James K. Peterson. November 9, Derivatives in 2D! Chain Rule

Lagrange Multipliers

MATH The Chain Rule Fall 2016 A vector function of a vector variable is a function F: R n R m. In practice, if x 1, x n is the input,

Solutions to old Exam 3 problems

Solutions to Sample Questions for Final Exam

Dynamical Systems. August 13, 2013

Differentiation - Quick Review From Calculus

Exercises for Multivariable Differential Calculus XM521

x 1. x n i + x 2 j (x 1, x 2, x 3 ) = x 1 j + x 3

STEP Support Programme. Pure STEP 1 Questions

Lecture 10. Econ August 21

EC /11. Math for Microeconomics September Course, Part II Lecture Notes. Course Outline

4 Differential Equations

7a3 2. (c) πa 3 (d) πa 3 (e) πa3

Faculty of Engineering, Mathematics and Science School of Mathematics

A Note on Two Different Types of Matrices and Their Applications

8.7 MacLaurin Polynomials

Mathematic 108, Fall 2015: Solutions to assignment #7

Chapter 1 Vector Spaces

Calculus I Review Solutions

Numerical Optimization

Math 291-2: Final Exam Solutions Northwestern University, Winter 2016

Higher order derivative

September Math Course: First Order Derivative

8.5 Taylor Polynomials and Taylor Series

Math 118, Fall 2014 Final Exam

Transcription:

REVIEW OF DIFFERENTIAL CALCULUS DONU ARAPURA 1. Limits and continuity To simplify the statements, we will often stick to two variables, but everything holds with any number of variables. Let f(x, y) be a real valued function defined on open subset of R 2. The it f(x, y) = L (x,y) (a,b) means that f(x, y) is approximately L whenever (x, y) is close to (a, b). The precise meaning is as follows. If we specified ɛ > 0 (say ɛ = 0.0005), then we could pick a tolerance δ > 0 which would guarantee that f(x, y) L < ɛ (i.e. f(x, y) agrees with L up to the first 3 digits for ɛ = 0.0005) whenever the distance between (x, y) and (a, b) is less than δ. Here are a few examples. Example 1.1. Consider (x,y) (0,0) x 2 x 2 + y 2 Along the x-axis (y = 0), the fraction is identically 1. Which means that the it would have to be 1 if it existed. However, along the y-axis (x = 0), the fraction is identically 0. So the it would have to be 0 if it existed. The two statements are contradictory, so we must conclude that it does not exist. Example 1.2. Consider (x,y) (0,0) x 3 x 2 + y 2 The same sort of analysis, would show that the its along the x and y axes are both zero. This leads us to suspect the answer is 0, but it is not enough to draw a definite conclusion. Instead, we the definition. Given ɛ > 0, we have to find δ > 0 so that x 3 x 2 + y 2 < ɛ when (x, y) < δ. If we rewrite this using polar coordinates, we get x 3 x 2 + y 2 = r 3 cos 3 θ r 2 = r cos3 θ r 3 while (x, y) = r. Observing that r 3 < r when r < 1, shows that we can take δ = ɛ if ɛ < 1 and (say) δ = 1 2 otherwise. Date: February 14, 2016. 1

2 DONU ARAPURA A function f(x, y) is continuous at (a, b) if f(x, y) (x,y) (a,b) exists and equals f(a, b). It is continuous if it is so at each point of its domain. Most of the basic of functions are continuous. Theorem 1.3. The following are functions of one or two variables are continuous (1) f(x, y) = x + y, (x, y) R 2 (2) f(x, y) = xy, (x, y) R 2 (3) f(x, y) = x y, (x, y) R2, y 0 (4) exp(x) = e x, x R (5) sin x, x R (6) cos x, x R (7) log x, x > 0 (we always use log = log e = ln). The next theorem will allow us to build more complicated continous functiobns from continuous parts. Theorem 1.4. If f(y 1,, y m ), g 1 (x 1,, x n ), are continuous. Then so is Example 1.5. Check that f(g 1 (x 1,, x n ),, g m (x 1,, x n )) f(x, y) = { sin(x 2 +y 2 ) x 2 +y 2 if (x, y) (0, 0) 1 if (x, y) = (0, 0) is continuous on R 2. For the solution, we introduce the auxillary function { sin(t) g(t) = t if t 0 1 if t = 0 This is clearly continous for t 0 by the above theorems. It is also continuous at t = 0, because we have t 0 g(t) = 1 by L Hopital s rule. Therefore f(x, y) = g(x 2 + y 2 ) is continuous everywhere. 2. Differentiability Given a function f(x, y), the partial derivatives x (x f(x 0 + h, y 0 ) f(x 0, y 0 ) 0, y 0 ) = f x (x 0, y 0 ) = h 0 h y (x f(x 0, y 0 + h) f(x 0, y 0 ) 0, y 0 ) = f y (x 0, y 0 ) = h 0 h These are slopes of the curves obtained by slicing z = f(x, y) parallel to the x and y axes. We say that f is differentiable, if near any point p = (x 0, y 0, f(x 0, y 0 )) the graph z = f(x, y) can approximated by a plane passing through p. In other words, there exists quantities A, B such that we may write with f(x, y) = f(x 0, y 0 ) + A(x x 0 ) + B(y y 0 ) + remainder (x,y) (x 0,y 0) remainder (x, y) (x 0, y 0 ) = 0

REVIEW OF DIFFERENTIAL CALCULUS 3 An equivalent way to express the last condition is that for every ɛ > 0 remainder < ɛ (x, y) (x 0, y 0 ) when the distance (x, y) (x 0, y 0 ) is small enough. The idea is that as the distance (x, y) (x 0, y 0 ) goes to zero, the remainder goes to zero at an even faster rate. We can see that the coefficients are nothing but the partial derivatives A = x (x 0, y 0 ), B = y (x 0, y 0 ) There is a stronger condition which is generally easier to check. f is called continuously differentiable or C 1 if it and its partial derivatives exist and are continuous. Theorem 2.1. A C 1 function is differentiable. Example 2.2. Check that f(x, y) = e x + cos xy is C 1. An easy calculation show that f x = e x y sin xy and f y = x sin xy. These functions are easily seen to be continous using the previous criteria. Example 2.3. Let f(x, y) = The partials are xy x2 +y 2 with f(x, y) = 0 is continuous (check this). x = y 3 (x 2 + y 2 ) 3/2 y = x 3 (x 2 + y 2 ) 3/2 But these are not continuous at the origin (check). So f is not C 1. More work shows that it is not differentiable either. 3. Chain rule In one variable, if y = f(x), x = g(t) where f and g are differentiable, then the chain rule says dy dt = dy dx dx dt Rewriting this in functional notation (f g) (t) = f (g(t))g (t) should convince you that there is more to it than just canceling dx. Let us start with the simplest analogue for several variables. Theorem 3.1. Suppose that z = f(x, y), x = g(t), y = h(t), where the functions are differentiable. Then dz dt = z dx x dt + z dz y dt Proof. In outline, g(t + t) = g(t) + g (t) t + R 1 h(t + t) = h(t) + h (t) t + R 2 where the remainders R i / t 0 as 0. Using differentiability of f, we can see that f(g(t+ t), h(t+ t)) = f(g(t), h(t))+f x (g(t), h(t))g (t) t+f y (g(t), h(t))h (t) t+r 3

4 DONU ARAPURA where R 3 / t 0. This implies f(g(t + t), h(t + t)) f(g(t), h(t)) t = f x(g(t), h(t))g (t) t + f y (g(t), h(t))h (t) t + R 3 t f x (g(t), h(t))g (t) + f y (g(t), h(t))h (t) To state the general form, let F : R n R m be a vector valued function. This can be written out as F (x 1, x n ) = (f 1 (x 1,, x n ), f 2 (x 1,, x n ), ) where the f i are scalar valued functions. The matrix or Jacobian derivative 1 1 x 1 x 2 DF = 2 2 x 1 x 2 This is an m n matrix valued function. We say that F is differentiable if F (v + h) = F (v) + DF (v)h + remainder where, as before, the remainder goes to 0 as h 0 at the same rate or faster. Also to be clear v, h R n are vectors. The expression DF (v)h means, first evaluate the function DF at v, and then multiply the resulting matrix with h. Theorem 3.2 (Chain rule). Given differentiable functions F : R n R m and G : R m R p, the function G F (v) = G(F (v)) is also differentiable. Its derivative and D(G F )(v) = DG(F (v))df (v) To see what this means more concretely, write where and Then the chain rule says that ( ) y1 y 1 u 1 u 2 y 2 = u 1 x i = f i (u 1,, u n ) y i = g i (x 1,, x m ) F (u 1,, u n ) = (f 1 (u 1,, u n ), ) G(x 1,, x m ) = (g 1 (x 1,, x m ), ) ) ( ) x 2 x1 x 1 u 1 u 2 x x 1 2 u 1 ( y1 y 1 x 1 y 2 This can be written out as a series of equations y i u j = y i x 1 x 1 u j + y i x 2 x 2 u j + y i x n x n u j

REVIEW OF DIFFERENTIAL CALCULUS 5 4. gradient Given f(x 1, x 2, ), the matrix derivative is usually called the gradient f = Df = (,, ) x 1 x 2 To understand what this means, let us work in the plane. Given a unit vector v = (v 1, v 2 ) R 2, the directional derivative of f along v at (x 0, y 0 ) is f(x + v 1 h, y + v 2 h) f(x, y) h 0 h This is the slope of the curve obtained by cutting the surface z = f(x, y) by a line parallel to v. Theorem 4.1. If f is differentiable, then the direction derivative of f along v is f v = x v 1 + y v 2 Proof. The directional derivative can be rewritten as dz dt, where x = x 0 + v 1 t, y = y 0 + v 2 t. Therefore chain rules shows that dz dt = dx x dt + dy y dt = x v 1 + y v 2 Given a unit vector v making angle θ with f, the directional derivative f v = f cos θ is maximized when θ = 0. We can summarize this as Corollary 4.2. The gradient f points in the direction where f increases the fastest, and f points in the direction where f decreases the fastest. Theorem 4.3. The gradient f(x 0, y 0, z) gives a normal to the surface f(x, y, z) = c at (x 0, y 0, z 0 ). Proof. We have to show that f v = 0 for any tangent vector v to the surface at (x 0, y 0, z 0 ). Such a vector is the tangent vector to some curve C lying on the surface. We can parameterize it by x = g(t), y = h(t), z = k(t) with (x 0, y 0, z 0 ) corresponding to t = 0. Since C lies on the surface f(g(t), h(t), k(t)) = c Differentiating both sides and using the chain rule gives dx x dt + dy y dt + g dz t dt = 0 We can write this as f v = 0 which is what we wanted to prove. 5. Second order derivatives Given z = f(x, y), the second order partial derivatives are defined by x 2 = 2 z x 2 = f xx = z xx = ( ) x x x y = 2 z x y = f yx = z yx = x ( y )

6 DONU ARAPURA y x = 2 z x y = f xy = z xy = ( ) y x ( ) y y 2 = 2 f y 2 = f yy = z yy = y A similar story holds for more than two variables. A function f(x 1, x 2, ) is twice continuously differentiable or C 2 if its partial derivatives up to second order exist and are continuous. Theorem 5.1. If f(x 1, x 2, ) is C 2 then the mixed partials are equal x j x i = 2 f x i x j Proof. We just outline the proof for f(x, y). We have = x 0 = x y = f y (x + x, y) f y (x, y) x 0 x [ 1 f(x + x, y + y) f(x + x, y) x y 0 y y 0 x 0 y 0 By a similar argument y 0 x 0 ] f(x, y + y) f(x, y) y 1 [f(x + x, y + y) f(x + x, y) f(x, y + y) + f(x, y)] x y y x = 1 [f(x + x, y + y) f(x + x, y) f(x, y + y) + f(x, y)] x y If we could interchange the it symbols we would be done. As a general rule, this is not always possible. However, in our case, using the C 2 condition it can be shown that both expressions coincide with ( x, y) (0,0) 1 x y [f(x + x, y + y) f(x + x, y) f(x, y + y) + f(x, y)] If f(x, y) is C 2, then we have a Taylor expansion [ ] [ ] f(x, y) = f(a, b) + (a, b) (x a) + (a, b) (y b) + 1 x y 2 [ 2 ] f + (a, b) (x a)(y b) + 1 y x 2 [ 2 ] f (a, b) (y b) 2 + R y2 [ 2 ] f (a, b) (x a) 2 x2 To be useful, we need to know that remainder R is small when (x, y) is close to (a, b). Here is the precise statement. Theorem 5.2. The remainder R satisfies (x,y) (a,b) R (x, y) (a, b) 2 0

REVIEW OF DIFFERENTIAL CALCULUS 7 Proof. Let g(t) = f(a + t(x a), b + t(y b)). This is a C 2 function. So by the usual Taylor formula for one variable, (1) g(t) = g(0) + g (0)t + 1 2 g (0)t 2 + R where R/t 2 0 as t 0. By the chain rule g (t) = f x (a + t(x a), b + t(y b))(x a) + f y (a + t(x a), b + t(x b))(y b) g (t) = f xx (a + t(x a), b + t(y b))(x a) 2 + Substituting into (1) and setting t = 1, gives the two variable Taylor expansion where the remainder behaves as stated. 6. Maxima-Minima Partial derivatives can be used to determine maxima and minima. Given a C 1 function on R n, a critical point is where the gradient is zero. We have the following basic fact: Theorem 6.1. If f is C 1 function on R n with a local maximum or minimum at a point, then this is a critical point. Proof. Let us assume that n = 2 and call the point (a, b). Let z = f(x, y) be the graph. By calculus of one variable the slope of the curves z = f(x, b) and z = f(a, y) are zero (a, b). This means that partial derivatives are zero at (a, b). Example 6.2. Let f(x, y) = sin x sin y. Setting f x = cos x sin y = 0 and f y = sin x cos y = 0 gives infinitely many solutions (0, 0), (π/2, π/2), To proceed further, we need the second derivative test. Now suppose that (a, b) = (0, 0) is a critical point. Then Taylor s formula reduces to f(x, y) = f(0, 0) + 1 [ fxx (0, 0)x 2 + 2f xy (0, 0)xy + f yy (0, 0)y 2] + R 2 where the remainder R is small in the sense of theorem 5.2. Let us rewrite the quadratic part (in square brackets) as Q(x, y) = Ax 2 + 2Bxy + Cy 2 We want to assume that this quadratic function is nondegenerate in the sense that it Q(x, y) 0 unless (x, y) = (0, 0). Then it will dominate the remainder, so it controls how f(x, y) behaves close to (0, 0). Suppose for now that A 0. Then we can complete the square Ax 2 + 2Bxy + Cy 2 = A(x + B A y)2 + 1 A (AC B2 )y 2 Notice that AC B 2 cannot be zero because of our nondegeneracy condition. It follows that if A > 0 and the discriminant AC B 2 > 0, then we always have Q(x, y) > 0, if (x, y) (0, 0) Therefore we have local minimum at (0, 0). If A < 0 and AC B 2 > 0, then Q(x, y) < 0, if (x, y) (0, 0)

8 DONU ARAPURA Therefore we have a local maximum at (0, 0). Finally, suppose that AC B < 0, then signs of the coefficients of A(x + B A y)2 + 1 A (AC B2 )y 2 are opposite. This means that we have a saddle point at (0, 0). With a bit more work, we can see that this conclusion holds even if A = 0. To summarize: Theorem 6.3. If (a, b) is the critical point of a C 2 function f(x, y). Then (1) It is a local minimum of f xx (a, b) > 0 and f xx (0, 0)f yy (0, 0) f xy (0, 0) 2 > 0. (2) It is a local minimum of f xx (a, b) < 0 and f xx (0, 0)f yy (0, 0) f xy (0, 0) 2 > 0. (3) It is a saddle point if f xx (0, 0)f yy (0, 0) f xy (0, 0) 2 < 0. (4) If f xx (0, 0)f yy (0, 0) f xy (0, 0) 2 = 0, Q is degenerate, so the test gives no information. Example 6.4. To finish example 6.2. Let f(x, y) = sin x sin y, f xx = sin x sin y, f xy = cos x cos y, f yy = sin x sin y. At (0, 0), the discriminant f xx (0, 0)f yy (0, 0) f xy (0, 0) 2 = 1, so this is a saddle point. At (π/2, π/2), f xx (π/2, π/2) = 1 and the discriminant is 1, so this gives a local maximum. To finish the story, we have to explain how the second derivative test works in more than two variables. If f(x 1,, x n ) is C 2, we have a Taylor expansion n f(x 1,, x n ) = f(0,, 0) + (0,, 0)x i x i + 1 2 n n i=1 j=1 i=1 x i x j (0,, 0)x i x j + R where we are expanding around the origin for simplicity. The remainder R goes to zero fast as (x 1,, x n ) (0,, 0). Let us write h ij = Q(x 1,, x n ) = 1 2 2 f x i x j (0,, 0) n i=1 j=1 n h ij x i x j Then the h ij forms an n n matrix H, called the Hessian; more precisely it is the Hessian evaluated at (0,, 0). We know that h ij = h ji by equality of mixed partials. Therefore H is a symmetric matrix. The last condition can be expressed by saying that H is equal to its transpose H T. The function Q, or matrix H, is called nondegenerate if Q(x 1,, x n ) 0 when (x 1,, x n ) is nonzero, positive definite if Q(x 1,, x n ) > 0 when (x 1,, x n ) is nonzero, negative definite if Q(x 1,, x n ) < 0 when (x 1,, x n ) is nonzero, Now suppose (0,, 0) is a critical point, then Taylor s formula becomes f(x 1,, x n ) = f(0,, 0) + 1 2 Q(x 1,, x n ) + R When Q is nondegenerate, it controls the behaviour near the critical point. Here is the general second derivative test. Theorem 6.5. If H is the Hessian evaluated at critical point of f, then

REVIEW OF DIFFERENTIAL CALCULUS 9 (1) if H is positive definite, the point is local minimum, (2) if H is negative definite, the point is local minimum, (3) in all other cases, either H is nondegenerate and the point is a saddle point, or else H is degenerate and the test is inconclusive. In order to turn this into a useful test, we need a method for checking when a matrix H is positive or negative definite. If H is a diagonal matrix then there is an obvious criterion: it is positive definite exactly when the diagonal entries are all positive. To extend this to general matrices, we recall a definition from linear algebra. A real or complex number λ is called an eigenvalue of H if there is a nonzero n 1 column vector v, called an eigenvector, such that Hv = λv There is a standard method for finding the eigenvalues for a matrix which can be found in any book on linear algebra. Or else on can do this numerically using a computer. The following result is a consequence of the spectral theorem in linear algebra. Theorem 6.6. If H is a real symmetric matrix, then all its eigenvalues are real. H is positive definite (respectively negative definite) if and only all its eigenvalues are positive (respectively negative). This leads to a practical test to find maxima and minima when combined with previous theorem.