D(f/g)(P ) = D(f)(P )g(p ) f(p )D(g)(P ). g 2 (P )

We first record a very useful: 11. Higher derivatives Theorem 11.1. Let A R n be an open subset. Let f : A R m and g : A R m be two functions and suppose that P A. Let λ A be a scalar. If f and g are differentiable at P, then (1) f +g is differentiable at P and D(f +g)(p ) = Df(P )+Dg(P ). () λ f is differentiable at P and D(λf)(P ) = λd(f)(p ). Now suppose that m = 1. (3) fg is differentiable at P and D(fg)(P ) = D(f)(P )g(p )+f(p )D(g)(P ). (4) If g(p ) = 0, then fg is differentiable at P and D(f/g)(P ) = D(f)(P )g(p ) f(p )D(g)(P ). g (P ) If the partial derivatives of f and g exist and are continuous, then (11.1) follows from the well-known single variable case. One can prove the general case of (11.1), by hand (basically lots of ɛ s and δ s). However, perhaps the best way to prove (11.1) is to use the chain rule, proved in the next section. What about higher derivatives? Definition 11.. Let A R n be an open set and let f : A R be a function. The kth order partial derivative of f, with respect to the variables x i1, x i,... x ik is the iterated derivative k f f (P ) = ( (... ( )... ))(P ). x ik x ik 1... x i x i1 x ik x ik 1 x i x i1 We will also use the notation f xik x ik 1...x i x i1 (P ). Example 11.3. Let f : R R be the function f(x, t) = e at cos x. Then f xx (x, t) = ( (e at cos x)) x x = ( e at sin x) x = e at cos x. 1

On the other hand, Similarly, f xt (x, t) = ( (e at cos x)) x t = ( ae at cos x) x = ae at sin x. f tx (x, t) = ( (e at cos x)) t x = ( e at sin x) t = ae at sin x. Note that f t (x, t) = ae at cos x. It follows that f(x, t) is a solution to the Heat equation: f f a =. x t Definition 11.4. Let A R n be an open subset and let f : A R m be a function. We say that f is of class C k if all kth partial derivatives exist and are continuous. We say that f is of class C (aka smooth) if f is of class C k for all k. In lecture 10 we saw that if f is C 1, then it is differentiable. Theorem 11.5. Let A R n be an open subset and let f : A R m be a function. If f is C, then f f =, x i x j x j x i for all 1 i, j n. The proof uses the Mean Value Theorem. Suppose we are given A R an open subset and a function f : A R of class C 1. The objective is to find a solution to the equation f(x) = 0. Newton s method proceeds as follows. Start with some x 0 A. The best linear approximation to f(x) in a neighbourhood of x 0 is given by f(x 0 ) + f (x 0 )(x x 0 ).

If f (x 0 ) = 0, then the linear equation has the unique solution, f(x 0 ) + f (x 0 )(x x 0 ) = 0, f(x 0 ) x 1 = x 0. f (x 0 ) Now just keep going (assuming that f (x i ) is never zero), f(x 0 ) x 1 = x 0 f (x 0 ) f(x 1 ) x = x 1 f (x 1 )..... =. x n = x f f Claim 11.6. Suppose that x = lim n x n exists and f (x ) == 0. Then f(x ) = 0. Proof of (11.6). Indeed, we have x n = x f f Take the limit as n goes to of both sides: ) ). ) ). f(x ) x = x, f (x ) we we used the fact that f and f are continuous and f (x ) = 0. But then f(x ) = 0, as claimed. Suppose that A R n is open and f : A R n is a function. Suppose that f is C 1 (that is, suppose each of the coordinate functions f 1, f,..., f n is C 1 ). The objective is to find a solution to the equation f(p ) = 0. Start with any point P 0 A. The best linear approximation to f at P 0 is given by f(p 0 ) + Df(P 0 ) P P 0. 3

Assume that Df(P 0 ) is an invertible matrix, that is, assume that det Df(P 0 ) 0. Then the inverse matrix Df(P 0 ) 1 exists and the unique solution to the linear equation f(p 0 ) + Df(P 0 ) P P 0 = 0, is given by P 1 = P 0 Df(P 0 ) 1 f(p 0 ). Notice that matrix multiplication is not commutative, so that there is a difference between Df(P 0 ) 1 f(p 0 ) and f(p 0 )Df(P 0 ) 1. If possible, we get a sequence of solutions, P 1 = P 0 Df(P 0 ) 1 f(p 0 ) P = P 1 Df(P 1 ) 1 f(p 1 ). =. P n = P Df(P ) 1 f(p ). Suppose that the limit P = lim n P n exists and that Df(P ) is invertible. As before, if we take the limit of both sides, this implies that f(p ) = 0. Let us try a concrete example. Example 11.7. Solve x + y = 1 3 y = x. First we write down an appropriate function, f : R R, given by f(x, y) = (x + y 1, y x 3 ). Then we are looking for a point P such that f(p ) = (0, 0). Then ( ) x y Df(P ) =. 3x y The determinant of this matrix is 4xy + 6x y = xy( + 3x). Now if we are given a matrix ( ) a b c d, 4

then we may write down the inverse by hand, ( ) 1 d b. ad bc c a So ( ) 1 y y Df(P ) 1 = xy( + 3x) 3x x So, ( ) ( ) Df(P ) 1 1 y y x f(p ) = + y 1 3 xy( + 3x) 3x x y x ( ) 1 x y y + x 3 y = xy( + 3x) x 4 + 3x y 3x + xy One nice thing about this method is that it is quite easy to implement on a computer. Here is what happens if we start with (x 0, y 0 ) = (5, ), (x 0, y 0 ) = (5.00000000000000,.00000000000000) (x 1, y 1 ) = (3.47058835941, 0.6176470588359) (x, y ) = (.09875150983980, 1.37996311951634) (x 3, y 3 ) = (1.377480405610, 0.5610968705054) (x 4, y 4 ) = (0.95901654346683, 0.503839504009063) (x 5, y 5 ) = (0.7876550355685, 0.6578307357845) (x 6, y 6 ) = (0.75591879660404, 0.655438554539110), and if we start with (x 0, y 0 ) = (5, 5), (x 0, y 0 ) = (5.00000000000000, 5.00000000000000) (x 1, y 1 ) = (3.47058835941, 1.8594117647059) (x, y ) = (.09875150983980, 0.3635417055958) (x 3, y 3 ) = (1.377480405610, 0.306989760884339) (x 4, y 4 ) = (0.95901654346683, 0.5615899471130) (x 5, y 5 ) = (0.7876550355685, 0.6449641848458) (x 6, y 6 ) = (0.75591879660404, 0.65551917668858). One can sketch the two curves and check that these give reasonable solutions. One can also check that (x 6, y 6 ) lie close to the two given curves, by computing x 6 + y 6 1 and y 6 x 6 3. 5

MIT OpenCourseWare http://ocw.mit.edu 18.0 Calculus of Several Variables Fall 010 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.