Math 320-3: Lecture Notes Northwestern University, Spring 2015

Size: px

Start display at page:

Download "Math 320-3: Lecture Notes Northwestern University, Spring 2015"

Claud Powers
6 years ago
Views:

1 Math 320-3: Lecture Notes Northwestern University, Spring 2015 Written by Santiago Cañez These are lecture notes for Math 320-3, the third quarter of Real Analysis, taught at Northwestern University in the spring of The book used was the 4th edition of An Introduction to Analysis by Wade. Watch out for typos! Comments and suggestions are welcome. Contents March 30, 2015: Limits and Continuity 2 April 1, 2015: Linear Transformations, Partial Derivatives 5 April 3, 2015: Second-Order Partial Derivatives 9 April 6, 2015: Clairaut s Theorem, Differentiability 13 April 8, 2015: More on Differentiability 18 April 10, 2015: Yet More on Derivatives 22 April 13, 2015: The Chain Rule 26 April 15, 2015: Mean Value Theorem 31 April 17, 2015: More on Mean Value, Taylor s Theorem 35 April 20, 2015: Inverse Function Theorem 38 April 22, 2015: Implicit Function Theorem 43 April 24, 2015: More on Implicit Functions 46 April 27, 2015: Jordan Measurability 50 May 1, 2015: Riemann Integrability 56 May 4, 2015: More on Integrability 61 May 6, 2015: Fubini s Theorem 65 May 8, 2015: Change of Variables 72 May 11, 2015: Curves 77 May 13, 2015: Surfaces 81 May 15, 2015: Orientations 86 May 18, 2015: Vector Line/Surface Integrals 91 May 22, 2015: Green s Theorem 94 May 27, 2015: Stokes Theorem 98 May 29, 2015: Gauss s Theorem 105 June 1, 2015: Differential Forms 109

2 March 30, 2015: Limits and Continuity Welcome to the final quarter of real analysis! This quarter, to use an SAT-style analogy, is to multivariable calculus what the first quarter was to single-variable calculus. That is, this quarter is all about making precise the various concepts you would see in a multivariable calculus course such as multivariable derivatives and integrals, vector calculus and Stokes Theorem and pushing them further. After giving a broad introduction to this, we started talking about limits. The Euclidean norm. First we clarify some notation we ll be using all quarter long. The book discusses this in Chapter 8, which is essentially a review of linear algebra. For a point a R n written in components as a = (a 1,..., a n ), the (Euclidean) norm of a is a = a a2 n, and is nothing but the usual notion of length when we think of a as a vector. Thus for a, b R n, the norm of their difference: a b = (a 1 b 1 ) (a n b n ) 2 is nothing but the Euclidean distance between a and b, so in the language of metrics from last quarter we have d(a, b) = a b and a is then the distance from a to the origin 0. Using what we know about this Euclidean distance, we know that sometimes we can instead phrase things in terms of the taxicab or box metrics instead, but will always denote Euclidean length. Limits. Suppose that V R n is open and that a R n. For a function f : V \{a} R m, we say that the limit of f as x approaches a is L R m if for all ɛ > 0 there exists δ > 0 such that 0 < x a < δ implies f(x) L < ɛ. Limits are unique when they exist, and we use the notation lim x a f(x) = L when this is the case. The book discusses such limits in Chapter 9, which we skipped last quarter in favor of the metric space material in Chapter 10. These notes should cover everything we ll need to know about limits, but it won t hurt to briefly glance over Chapter 9 on your own. Remarks. A few remarks are in order. First, since the two norms used in the definitions are giving distances in R n and R m respectively, it is clear that a similar definition works for metric spaces in general. Indeed, all we do is take the ordinary definition of a limit for single-variable functions from first quarter analysis and replace absolute values by metrics; the special case where we use the Euclidean metric on R n and R m results in the definition we have here. But, this quarter we won t be using general metric spaces, so the special case above will be enough. Second, we should clarify why we are considering the domain of the function to be V \{a} where V is open in R n. The fact that we exclude a from the domain just indicates that the definition works perfectly well for functions which are not defined at a itself, since when considering limits we never care about what is happening at a, only what is happening near a. (The 0 < x a in the definition is what excludes x = a as a possible value.) More importantly, why are we assuming that V is open? The idea is that in order for the limit as x a to exist, we should allow x to approach a from any possible direction, and in order for this 2

3 to be the case we have to guarantee that our function is defined along all such possible directions near a. Saying that V is open guarantees that we can find a ball around a which is fully contained within V, so that it makes sense to take any possible direction towards a and remain within V. This will be a common theme throughout this quarter, in that whenever we have a definition which can be phrased in terms of limits, we will assume that the functions in question are defined on open subsets of R n. Proposition. Here are two basic facts when working with multivariable limits. First, just as we saw for single-variable limits in Math 320-1, instead of using ɛ s and δ s we can characterize limits in terms of sequences instead: lim f(x) = L if and only if for any sequence x n a with all x n a, we have f(x n ) L. x a One use of this is the following: if we can find two sequences x n and y n which both converge to a (and none of those terms are equal to a) such that either f(x n ) and f(y n ) converge to different things or one of these image sequences does not converge, then lim x a f(x) does not exist. The second fact is the one which makes working with multivariable limits more manageable, since it says that we can just work with component-wise limits instead. To be precise, we can write f : V \{a} R m in terms of its components as f(x) = (f 1 (x),..., f m (x)) where each f i : V \{a} R maps into R 1, and we can write L R m in terms of components as L = (L 1,..., L m ). Then: lim f(x) = L if and only if for each i = 1,..., m, lim f i(x) = L i, x a x a so that a multivariable limit exists if and only if the component-wise limits exist, and the value of the multivariable limit is a point whose components are the individual component-wise limits. This is good, since working with inequalities in R is simpler than working with inequalities in R m. The proof of the first fact is the same as the proof of the corresponding statement for singlevariable limits, only replacing absolute values in R by norms in R n. The second fact comes from the sequential characterization of limits and the fact that, as we saw last quarter, a sequence in R m converges if and only if its individual component sequences converge. Example. Define f : R 2 \{(0, 0)} R 2 by ( ) x 4 + y 4 xy f(x, y) = x 2 + y 2,. 3 x 2 + y 2 We show that the limit of f as (x, y) (0, 0) is (0, 0). To get a sense for why this is the correct value of the limit, recall the technique of converting to polar coordinates from multivariable calculus. Making the substitution x = r cos θ and y = r sin θ, we get x 4 + y 4 x 2 + y 2 = r2 (cos 4 θ + sin 4 θ) and xy 3 x 2 + y = r1/3 cos θ sin θ. 2 The pieces involving sine and cosine are bounded, and so the factors of r leftover will force the limits as r 0 to be zero. Still, we will prove this precisely using the definition we gave above, or rather the second fact in the proposition above. 3

4 Before doing so, note how messy it would be verify the definition directly without the fact about component-wise limits. For a fixed ɛ > 0, we would need to find δ > 0 such that 0 < x 2 + y 2 < δ implies f(x, y) (0, 0) = ( x 4 + y 4 ) ( 2 ) 2 xy x 2 + y 2 + < ɛ. 3 x 2 + y 2 The complicated nature of the term on the right suggests that this might not be very straightforward, and is why looking at the component-wise limits instead is simpler. Thus, first we claim that x 4 + y 4 lim (x,y) (0,0) x 2 + y 2 = 0. Indeed, for a fixed ɛ > 0 let δ = ɛ > 0. Since x 4 + y 4 x 4 + 2x 2 y 2 + y 4 = (x 2 + y 2 ) 2, for (x, y) such that 0 < x 2 + y 2 < δ we have: f(x, y) 0 = x 4 + y 4 x 2 + y 2 x2 + y 2 < δ 2 = ɛ as required. To justify that lim (x,y) (0,0) xy 3 x 2 + y = 0, 2 for a fixed ɛ > 0 we set δ = ɛ 3 > 0. Since for (x, y) such that 0 < x 2 + y 2 < δ we have: xy max{x 2, y 2 } x 2 + y 2, xy 3 x 2 + y (x2 + y 2 ) 1/2 2 (x 2 + y 2 ) 1/3 = (x2 + y 2 ) 1/6 = ( x 2 + y 2 ) 1/3 < δ 1/3 = ɛ, as required. Hence since the component-wise limits of f as (x, y) (0, 0) both exist and equal zero, we conclude that f(x, y) exists and equals (0, 0). lim (x,y) (0,0) Important. A multivariable limit exists if and only if its component-wise limits exist. Thus, in order to determine the value of such limits, most of the time it will be simpler to look at the component-wise limits instead. Continuity. We have already seen what it means for a function R n R m to be continuous, by taking any of the versions of continuity we had for functions between metric spaces last quarter and specializing them to the Euclidean metric. For instance, the ɛ-δ definition looks like: f : R n R m is continuous at a R n if for any ɛ > 0 there exists δ > 0 such that x a < δ implies f(x) f(a) < ɛ. We also have the characterization in terms of sequences and the one in terms of pre-images of open sets, both of which will be useful going forward. But now we can add one more phrased in terms of limits, which is really just a rephrasing of either the ɛ-δ definition of the sequential definition: f : R n R m is continuous at a R n if and 4

5 only if lim x a f(x) = f(a). In other words, saying that a function is continuous at a point means that the limit as you approach that point is the value of the function at that point. Back to the example. The function from the previous example was undefined at (0, 0), but using the value we found for the limit in that problem we can now conclude that the function f : R 2 R 2 defined by ( ) x 4 +y 4 xy, x f(x, y) = 2 +y 3 (x, y) (0, 0) 2 x 2 +y 2 (0, 0) (x, y) = (0, 0) is actually continuous on all of R 2. Important. A function f : R n R m (or defined on some smaller domain) is continuous at a R n if and only if lim x a f(x) = f(a). April 1, 2015: Linear Transformations, Partial Derivatives Today we continued talking a bit about continuous functions, in particular focusing on properties of linear transformations, and then began talking about partial derivatives. Partial derivatives are the first step towards formalizing the concept of differentiability in higher dimensions, but as we saw, they alone aren t enough to get the job done. Warm-Up. Suppose that f : R n R m is a function and that for some a R n, lim x a f(x) = L exists. We show that f is then bounded on some open set containing a. To be precise, we show that there exists an open subset V R n which contains a and a constant M such that f(x) M for all x V. (We ll take this inequality as what it means for a multivariable function to be bounded and is equivalent to saying that the image of f is contained in a ball of finite radius, which matches up with the definition of bounded we had for metric spaces last quarter.) The ɛ-δ definition of a limit guarantees that for ɛ = 1 there exists δ > 0 such that The reverse triangle inequality if 0 < x a < δ, then f(x) L < 1. f(x) L f(x) L then implies that f(x) < 1 + L, so f is bounded among points of B δ (a) which are different from a. To include a as well we can simply make the bound 1 + L larger if need be, say to be the maximum of 1 + L and 1 + f(a). Thus for M = max{1 + L, 1 + f(a) } and V = B δ (a), we have f(x) M for all x V, so f is bounded on the open set V containing a as required. The point of this is to show that a function cannot be unbounded near a point at which it has a limit. Proposition. You might recall the following fact from a multivariable calculus course, which is essentially a rephrasing of the sequential characterization of limits: lim f(x) exists and equals L x a 5

6 if and only if the single-variable limit as you approach a along any possible path exists and equals L. This gives a nice rephrasing, and is useful when trying to show that limits don t exist. Linear transformations. We recall a concept from linear algebra, which will play an important role when discussing multivariable differentiability. A function T : R n R m is said to be a linear transformation if it has the following properties: T (x + y) = T (x) + T (y) for all x, y R n, and T (cx) = ct (x) for all x R n and c R. Thus, linear transformations preserve addition (the first property) and preserve scalar multiplication (the second property). Apart from the definition, the key fact to know is that these are precisely the types of functions which can be defined via matrix multiplication: for any linear transformation T : R n R m there exists an m n matrix B such that T (x) = Bx for all x R n. Here, when multiplying an element x of R n by a matrix, we are expressing x as a column vector. Thus, linear transformations concretely look like: T (x 1,..., x n ) = (a 11 x a 1n x n, a 21 x a 2n x n,..., a m1 x a mn x n ) for scalars a ij R. Here, the a ij are the entries of the matrix B and this expression is the result written as a row of the matrix product Bx, where we write x = (x 1,..., x n ) as a column. This expression makes it clear that linear transformations are always continuous since their component functions are continuous. Norms of linear transformations. Given a linear transformation T : R n R m, we define its norm T by: T := sup T (x). x =1 That is, T is the supremum of the vector norms T (x) in R m as x ranges through all vectors in R n of norm 1. (To see why such a restriction on the norm of x is necessary, note that nonzero linear transformations are always unbounded since for x 0, T (cx) = ct (x) = c T (x) gets arbitrarily large as c. Thus for nonzero T, sup x R n T (x) is always infinite.) To justify that the supremum used in the definition of T is always finite, note that the set {x R n x = 1} the supremum is being taken over is closed and bounded, so it is compact. Thus the composition {x R n x = 1} R m R defined by x T (x) T (x), which is continuous since it is the composition of continuous functions, has a maximum value by the Extreme Value Theorem, and this (finite) maximum value is then T. For us, the most important property of this concept of the norm of a linear transformation is the following inequality, which will soon give us a way to bound expressions involving multivariable derivatives: for a linear transformation T : R n R m, we have T (x) T x for any x R n. 6

7 To see this, note first that when x = 0, T (x) = 0 and both sides of the inequality are 0 in this case. When x 0, has norm 1 and so x x ( ) x T T x since the left side is among the quantities of which the right side is the supremum. Thus for x 0: ( T (x) = T x x ) ( ) ( ) = x x x T = x x x T x T x as claimed. The exact value of T for a given linear transformation will not be so important, only that it always finite and nonnegative. Remark. The book goes through this material in Chapter 8, where it gives a slightly different definition of T as: T (x) T := sup x 0 x. This is equivalent to our definition since we can rewrite the expressions of which we are taking the supremum as T (x) = 1 ( T (x) = 1 x x x T (x) x = T x ) x where x at the end as norm 1. Thus the book s definition of T can be reduced to one which only involves vectors of norm 1, which thus agrees with our definition. The book then goes onto to prove that T is always finite and that T (x) T x using purely linear-algebraic means and without using the fact that T is continuous; the (in fact uniform) continuity of T is then derived from this using the bound: T (x) T (y) = T (x y) T x y. I think our approach is nicer and more succinct, especially since it builds off of the Extreme Value Theorem and avoids linear algebra. But feel free to check the book for full details about this alternate approach to defining T. Important. For a linear transformation T : R n R m, T x T x for any x R n. Partial derivatives. Let f : V R be a function defined on an open subset V of R n, let x = (x 1,..., x n ) and let a = (a 1,..., a n ) V. The partial derivative of f with respect to x i at a is the derivative if it exists of the single-variable function g(x i ) := f(a 1,..., x i,..., a n ) obtained by holding x j fixed at a j for j i and only allowing x i to vary. To be clear, denoting this partial derivative by f x i (or f xj ) we have: f x i (a) = lim x i a i f(a 1,..., x i,..., a n ) f(a 1,..., a i,..., a n ) x i a i if this limits exists. (The fraction on the right is simply g(x i) g(a i ) x i a i function introduced above.) where g is the single-variable 7

8 Equivalently, by setting h = x i a i, we can rewrite this limit as: f f(a 1,..., a i + h,..., a n ) f(a 1,..., a i,..., a n ) (a) = lim, x i h 0 h which is analogous to the expression g(a i + h) g(a i ) lim h 0 h as opposed to g(x i ) g(a i ) lim x i a i x i a i for single-variable derivatives. Even more succinctly, by introducing the vector e i which has a 1 in the i-th coordinate and zeros elsewhere, we have: f f(a + he i ) f(a) (a) = lim, x i h 0 h again whenever this limit exists. For a function f : V R m written in components as f(x) = (f 1 (x),..., f n (x)), we say that the partial derivative of f with respect to x i at a exists when the partial derivative of each component function with respect to x i exists and we set Example. Define f : R 2 R by f x i (a) := f(x, y) = ( f1 x i (a),..., f n x i (a) ). { x 2 y 2 x 4 +y 4 (x, y) (0, 0) 0 (x, y) = (0, 0). We claim that both partial derivatives f x (0, 0) and f y (0, 0) exist and equal 0. Indeed, we have: and f x (0, 0) = f f(h, 0) f(0, 0) 0 0 (0, 0) = lim = lim = 0 x h 0 h h 0 h f y (0, 0) = f f(0, h) f(0, 0) 0 0 (0, 0) = lim = lim = 0, y h 0 h h 0 h where in both computations we use the fact that f(h, 0) = 0 = f(0, h) since we are considering h h 4 approaching 0 but never equal to 0 in such limits. Partial derivatives and continuity. Here s the punchline. Ideally, we would like the fact that differentiability implies continuity to be true in the higher-dimensional setting as well, and the previous example shows that if we try to characterize differentiability solely in terms of partial derivatives this won t be true. Indeed, both partial derivatives of that function exist at the origin and yet that function is not continuous at the origin: taking the limit as we approach (0, 0) along the x-axis gives: lim f(x, 0) = lim x 0 x 0 0 x 4 = 0 while taking the limit as we approach (0, 0) along the line y = x gives: x 4 lim f(x, x) = lim x 0 x 0 x 4 + x 4 = 1 2, 8

9 so lim (x,y) (0,0) f(x, y) does not exist, let alone equal f(0, 0) as would be required for continuity at the origin. The issue is that differentiability in higher-dimensions is a trickier beat and requires more than the existence of partial derivatives alone, and will force us to think harder about what derivatives are really meant to measure. We ll get to this next week. Important. Partial derivatives of a function R n R are defined as in a multivariable calculus course, by taking single-variable derivatives of the function in question as we hold all variables except for one constant. For functions mapping into R m where m > 1, partial derivatives are defined component-wise. The existence of all partial derivatives at a point is not enough to guarantee continuity at that point. April 3, 2015: Second-Order Partial Derivatives Today we spoke about second-order partial derivatives, focusing on an example where the mixed second-order derivatives are unequal. We then stated Clairaut s Theorem, which guarantees that such mixed second-order derivatives are in fact equal as long as they re continuous. Warm-Up 1. We determine the partial differentiability of the function f : R 2 R defined by { 2x 3 +3y 4 (x, y) (0, 0) f(x, y) = x 2 +y 2 0 (x, y) = (0, 0). First, for any (x, y) (0, 0), there exists an open set around (x, y) which excludes the origin and hence the function f on this open set agrees with the function g(x, y) = 2x3 + 3y 4 x 2 + y 2. Since the limits defining partial derivatives only care about what s happening close enough to the point being approached, this means that the partial differentiability of f at any (x, y) (0, 0) is the same as that of f. Thus because both partial derivatives g x (x, y) and g y (x, y) exist at any (x, y) (0, 0) since g is a quotient of partially-differentiable functions with nonzero denominator, we conclude that f x (x, y) and f y (x, y) exist at any (x, y) (0, 0) as well. (The values of these partial derivatives are obtained simply by differentiating with respect to the one variable at a time using the quotient rule, as you would have done in a multivariable calculus course.) The reasoning above doesn t work at (0, 0), in which case we fall back to the limit definition of partial derivatives. We have: and f(h, 0) f(0, 0) 2h 0 f x (0, 0) = lim = lim = 2 h 0 h h 0 h f(0, h) f(0, 0) 3h 2 0 f y (0, 0) = lim = lim = 0. h 0 h h 0 h To be clear, for f x (0, 0) we are taking a limit where only x varies and y is held constant at 0, and for f y (0, 0) we hold x constant at 0 and take a limit where only y varies. We conclude that f x and f y exist on all of R 2. Warm-Up 2. Suppose that f : R n R m is such that all partial derivatives of f exist and are equal to zero throughout R n. We show that f must be constant. To keep the notation and geometric 9

10 picture simpler, we only do this for the case where n = 2 and m = 1, but the general case is very similar. The issue is that f x and f y only gives us information about how f behaves in two specific directions, whereas to say that f is constant we have to consider how f behaves on all of R 2. Concretely, suppose that (a, b), (c, d) R 2 ; we want to show that f(a, b) = f(c, d). To get some intuition we will use the following picture which assumes that (a, b) and (c, d) do not lie on the same horizontal nor straight line in R 2, but our proof works even when this is the case: Consider the point (c, b). Since f x (x, y) = 0 for all (x, y) R 2, f is constant along any horizontal line, and thus along the line connecting (a, b) and (c, b). Hence f(a, b) = f(c, b). Now, since f y (x, y) = 0 for all (x, y) R 2, f is constant along any vertical line and thus along the vertical line connecting (c, b) and (c, d). Hence f(c, b) = f(c, d) and together with the previous equal we get f(a, b) = f(c, b) = f(c, d) as required. We conclude that f is constant on R 2, and mention again that a similar argument works in a more general R n R m setting. Remark. The key take away from the second Warm-Up is that we were able to use information about the behavior of f in specific directions to conclude something about the behavior of f overall. In particular, the fact that f x = 0 everywhere implies that f is constant along horizontal lines is a consequence of the single-variable Mean Value Theorem applied to the x-coordinate, and the fact that f y = 0 everywhere implies that f is constant along vertical lines comes from the single-variable Mean Value Theorem applied to the y-coordinate. We ll see a similar application of the Mean Value Theorem one coordinate at-a-time in the proof of Clairaut s Theorem. Important. Sometimes, but not always, behavior of a function along specific directions can be pieced together to obtain information about that function everywhere. Higher-order partial derivatives. Second-order partial derivatives are defined by partial differentiating first-order partial derivatives, and so on for third, fourth, and higher-order partial derivatives. In particular, for a function f of two variables, there are four total second-order partial derivatives we can take: 2 f x 2 := x ( f x ), 2 f y x := y ( ) f, x 2 f x y := x ( ) f, y 2 f y 2 := y ( ) f. y We say that a function f : V R m, where V R n is open, is C p on V if all partial derivatives up to and including the p-th order ones exist and are continuous throughout V. 10

11 Example. This is a standard example showing that f xy and f yx are not necessary equal, contrary to what you might remember from a multivariable calculus course. The issue is that these so-called mixed second-order derivatives are only guaranteed to be equal when at least one is continuous, which is thus not the case in this example. This example is so commonly used that even after hours of searching I was unable to find a different one which illustrates this same concept. This example is in our book, but we ll flesh out some of the details which the book glosses over. Consider the function f : R 2 R defined by f(x, y) = { xy ( x 2 y 2 x 2 +y 2 ) (x, y) (0, 0) 0 (x, y) = (0, 0). We claim that this function is C 1 and that both second-order derivatives f xy (0, 0) and f yx (0, 0) exist at the origin and are not equal. First, as in the Warm-Up, the existence of f x (x, y) and f y (x, y) for (x, y) (0, 0) follow from the fact that f agrees with the function near such (x, y). By the quotient rule we have: ( x 2 y 2 ) g(x, y) = xy x 2 + y 2 ( (x 2 + y 2 )2x (x 2 y 2 ) ( )2x x 2 y 2 ) 4xy 2 ( x 2 f x (x, y) = xy (x 2 + y 2 ) 2 + y x 2 + y 2 = xy (x 2 + y 2 ) 2 + y y 2 ) x 2 + y 2 for (x, y) (0, 0). To check the existence of f x (0, 0) we compute: so f x (0, 0) = 0. Thus we have: f(h, 0) f(0, 0) 0 lim = lim h 0 h h 0 h = 0, ( ) x 2 y 2 x 2 +y 2 { xy 4xy 2 + y (x, y) (0, 0) (x f x (x, y) = 2 +y 2 ) 2 0 (x, y) = (0, 0). Now, f is continuous at (x, y) (0, 0) again because f agrees with the continuous expression 4xy 2 ( x 2 xy (x 2 + y 2 ) 2 + y y 2 ) x 2 + y 2 at and near such points. To check continuity at the origin we must show that lim f x(x, y) = 0 = f x (0, 0). (x,y) (0,0) We use the inequality 2 xy x 2 + y 2 which comes from rearranging the terms in 0 ( x y ) 2 = x 2 2 xy + y 2. This gives 4 xy 2 (x 2 + y 2 ) 2, so for (x, y) (0, 0) we have: f x (x, y) = xy 4xy 2 ( x 2 (x 2 + y 2 ) 2 + y y 2 ) x 2 + y 2 11

12 xy 4xy 2 ( x 2 (x 2 + y 2 ) 2 + y 2 ) y x 2 + y 2 = 4 xy 2 y (x 2 + y 2 ) 2 + y x 2 y 2 x 2 + y 2 (x2 + y 2 ) 2 y (x 2 + y 2 ) 2 + y x 2 y 2 x 2 + y 2 = y + y = 2 y, where we use the fact that x2 y 2 1 since the denominator is larger than the numerator. Since x 2 +y 2 2 y 0 as (x, y) (0, 0), f x (x, y) 2 y implies that f x (x, y) 0 as (x, y) (0, 0) and hence that f x (x, y) 0 as well. Thus f x is continuous on all of R 2 as claimed. Similar reasoning shows that f y exists and is continuous on all of R 2, but we ll leave this verification to the homework. Next we claim that f xy (0, 0) exists and equals 1. Note that for y 0 we have ( ) 0 y 2 f x (0, y) = 0 + y 0 + y 2 = y, and that this expression also gives the correct value of f x (0, 0) = 0. function f x (0, y) is given by f x (0, y) = y Thus the single-variable for all y, and hence its single-variable derivative at y = 0 is 1. But this single-variable derivative is precisely the definition of f xy (0, 0), so f xy (0, 0) = 1 as claimed. A similar computation which is also left to the homework will show that f yx (0, 0) = 1, so f xy (0, 0) f yx (0, 0) as was to be shown. Clairaut s Theorem. As the example above shows, f xy and f yx do not have to agree in general. However, if these second-order derivatives are continuous, then they must be equal. More generally, we have: Suppose that f : V R m is C 2 on an open subset V of R n and that x 1,..., x n are variables on V. Then f xi x j (a, b) = f xj x i (a, b) for any i and j and any (a, b) V. Thus continuity of mixed second order derivatives guarantees their equality. The name Clairaut s Theorem for this result is very common although our book doesn t use it and instead only refers to this as Theorem The book actually proves a slightly stronger version which instead of assuming that f is C 2 only assumes it is C 1 and that one of the mixed second-order derivatives f xi x j exists and is continuous at (a, b) -the existence of the other mixed derivative f xj x i (a, b) and the fact that it s equal to f xi x j (a, b) is then derived as a consequence. The ideas behind the proofs of this stronger version and our version are the same, so we ll only focus on our version. We ll save the proof for next time. Finally, we ll state that this result about second-order derivatives implies similar ones about higher-order derivatives. For instance, if f is C 3, then applying Clairaut s Theorem to the C 2 function f xi gives the equality of f xi x j x k and f xi x k x j, and so on. Important. Mixed second (and higher) order derivatives do not always agree, except when they re continuous as is the case for C 2 (or C p with p > 2) functions. What Clairaut s Theorem is really about. We ll finish today by going off on a bit of a tangent, to allude to some deeper meaning behind Clairaut s Theorem. This is something we might come 12

13 back to later on when we talk about surfaces, but even if we do we won t go into it very deeply at all. Take a course on differential geometry to really learn what this is all about. We denote the sphere in R 3 by S 2. (In general, S n denotes the set of points in R n+1 at distance 1 from the origin.) Then we can develop much of what we re doing this quarter for functions f : S 2 R defined on S 2. In particular, we can make sense of partial derivatives of such a function with respect to variables (s, t) which are variables along the sphere. (In more precise language, the sphere can be parametrized using parametric equations, and s and t are the parameters of these equations.) Then we can also compute second-order partial derivatives, and ask whether the analog of Clairaut s Theorem still holds. The fact is that this analog does not hold in this new setting: f st is not necessarily equal to f ts for f : S 2 R even if both of these second-order derivatives are continuous! The issue is that there sphere S 2 is a curved surface while R 2 (or more generally R n ) is what s called flat. The fact that Clairaut s Theorem holds on R n but not on S 2 reflects the fact that R n has zero curvature but that S 2 has nonzero (in fact positive) curvature. The difference f xy f yx measures the extent to which a given space is curved: this difference is identically zero on R 2 but nonzero on S 2. This is all we ll say about this for now, but in the end this gives some nice geometric meaning to Clairaut s Theorem. Again, take a differential geometry course to learn more. April 6, 2015: Clairaut s Theorem, Differentiability Today we proved Clairaut s Theorem and then started talking about differentiability in the higherdimensional setting. Here differentiability is defined in terms of matrices, and truly gets at the idea which derivatives in general are meant to capture. Warm-Up. Consider the function f : R 2 R defined by { 2x 3 +3y 4 (x, y) (0, 0) f(x, y) = x 2 +y 2 0 (x, y) = (0, 0). from the Warm-Up last time. We now show that the second-order derivatives f yy (0, 0) and f yx (0, 0) exist and determine their values. First, for (x, y) (0, 0) we have f y (x, y) = (x2 + y 2 )(12y 3 ) 2y( 2x 3 + 3y 4 ) (x 2 + y 2 ) 2 = 12x2 y 3 + 6y 5 4x 3 y (x 2 + y 2 ) 2. Since f(0, y) = 3y 2 for all y, we get f y (0, 0) = 0 after differentiating with respect to y and setting y = 0. (Note one subtlety: the expression f(0, y) = 3y 2 was obtained by setting x = 0 in the fraction defining f for (x, y) (0, 0) so that technically at this point we only have f(0, y) = 3y 2 for y 0. But since this happens to also give the correct value for f(0, 0) = 0, we indeed have f(0, y) = 3y 2 for all y as claimed.) Thus the function f y is defined by { 12x 2 y 3 +6y 5 4x 3 y (x, y) (0, 0) (x f y (x, y) = 2 +y 2 ) 2 0 (x, y) = (0, 0). 13

14 Now, from this we see that f y (0, y) = 6y for all y (including y = 0), so that f yy (0, 0) = 6. (We can also use the fact that f yy should be the ordinary second derivative of the single variable function f(0, y) = 3y 2.) The second-order derivative f yx (0, 0) should be the ordinary derivative at x = 0 of the single-variable function obtained by holding y constant at 0 in f y (x, y), so since f y (x, 0) = 0 for all x including x = 0, we get f yx (0, 0) = 0 too. Thus both second-order partial derivatives f yy (0, 0) and f yx (0, 0) exist and equal 6 and 0 respectively. Back to Clairaut s Theorem. Recall the statement of Clairaut s Theorem: if f : V R m is C 2 on an open subset V of R n, then f xi x j (a, b) = f xj x i (a, b) at any (a, b) V for any i and j. We now give a proof in the case where V is a subset of R 2, so that f is a function of two variables x and y, and where m = 1. This is only done to keep the notation simpler, but the proof in the most general case is very similar. Before giving the proof, let us talk about the ideas which go into it. Fix (a, b) V. For small values of h and k (small enough to guarantee that (a + h, b + k), (a + h, b), and (a, b + k) are all still in domain of f; this is where openness of V is used and is where the book gets the requirement that h, k < r/ 2 ) we introduce the function (h, k) = f(a + h, b + k) f(a, b + k) f(a + h, b) + f(a, b), which measures the behavior of f at the corners of a rectangle: The proof amounts to computing (h, k) lim (h,k) (0,0) hk in two ways: computing it one way gives the value f yx (a, b) and computing it the other way gives f xy (a, b), so since these two expressions equal the same limit, they must equal each other as Clairaut s Theorem claims. This limit is computed in the first way by focusing on what s happening in the picture above vertically and then horizontally, and in the second way by focusing on what s happening horizontally and then vertically. To be precise, both computations come from the applying the single-variable Mean Value Theorem in one direction and then in the other, and so the proof is a reflection of the idea mentioned last time that sometimes we can check the behavior of a function in one direction at a time to determine its overall behavior. 14

15 Looking at the definition of (h, k), note that the first and third terms have the same first input and only differ in their second input; applying the Mean Value Theorem in the y-coordinate to these gives f(a + h, b + k) f(a + h, b) = f y (a + h, c)k for some c between b and b + k. Similarly, applying the Mean Value Theorem in the y-coordinate to the second and fourth terms in the definition of (h, k) gives f(a, b + k) f(a, b) = f y (a, d)k for some d between b and b + k, so that overall we get (h, k) = f y (a + h, c)k f ( a, d)k = k[f y (a + h, c) f y (a, d)]. At this point we would like to now apply the Mean Value Theorem in the x-coordinate to get an expression involving f yx, but the problem is that the two terms above also differ in their y- coordinates; we can only apply the single-variable Mean Value Theorem to expressions where the other coordinate is the same. In the proof below we will see how to guarantee we can take c and d here to be the same this is why we use the functions F and G defined below but this is a point which the book glosses over without explaining fully. Apart from this though, the book s approach works fine, but hopefully we ll make it a little easier to follow. Proof of Clairaut s Theorem. For h and k small enough we define (h, k) = f(a + h, b + k) f(a, b + k) f(a + h, b) + f(a, b) and compute lim (h,k) (0,0) (h,k) hk in two ways. First, introduce the single-variable function F (y) = f(a + h, y) f(a, y). Then we have (h, k) = F (b + k) F (b). By the single-variable Mean Value Theorem, there exists c between b and b + k such that which gives F (b + k) F (b) = F y (c)k, (h, k) = k[f y (a + h, c) f y (a, c)]. (The book uses the fact that any number c between b and b + k can be written as c = b + tk for some t (0, 1) in its proof.) Now, applying the single-variable Mean Value Theorem again in the x-coordinate gives the existence of l between a and a + h such that f y (a + h, c) f y (a, c) = f yx (l, c)h. (The book writes l as l = a + sh for some s (0, 1).) Thus we have that (h, k) = khf yx (l, c) so (h, k) hk = f yx (l, c). Since l is between a and a + h and c is between b and b + k, (l, c) (a, b) as (h, k) (0, 0) so since f yx is continuous at (a, b) we have: (h, k) lim = lim (h,k) (0,0) hk f yx(l, c) = f yx (a, b). (h,k) (0,0) 15

16 Now go back to the original definition of (h, k) and introduce the single-variable function G(x) = f(x, b + k) f(x, b). Then (h, k) = G(a + h) G(a). By the Mean Value Theorem there exists m between a and a + h such that G(a + h) G(a) = G x (m)h, which gives (h, k) = G(a + h) G(a) = G x (m)h = h[f x (m, b + k) f x (m, b)]. By the Mean Value Theorem again there exists n between b and b + k such that so f x (m, b + k) f x (m, b) = f xy (m, n)k, (h, k) = h[f x (m, b + k) f x (m, b)] = hkf xy (m, n). As (h, k) (0, 0), (m, n) (a, b) since m is between a and a + h and n between b and b + k, so the continuity of f xy at (a, b) gives: (h, k) lim = lim (h,k) (0,0) hk f xy(m, n) = f xy (a, b). (h,k) (0,0) Thus f xy (a, b) = f yx (a, b) since these both equal the same limit lim (h,k) (0,0) (h, k)/hk. Motivating higher-dimensional derivatives. The idea of viewing derivatives as slopes for single-variable functions is nice visually, but doesn t capture the correct essence of what differentiability means in higher-dimensions. (It is also harder to visualize what slope might mean in these higher-dimensional settings.) Instead, we use another point of view the linear approximation point of view of single-variable derivatives to motivate the more general notion of differentiability. If f : R R is differentiable at a R, then values of f at points near a are pretty well approximated by the tangent line to the graph of f at the point a; to be clear, we have that f(a + h) f(a) + f (a)h for small h, where the expression on the right is the usual tangent line approximation. Rewriting gives f(a + h) f(a) f (a)h again for small h. Here s the point: if we view h as describing the small difference between the inputs a and a + h, then the expression on the left f(a + h) f(a) gives the resulting difference between the outputs at these inputs. Thus we have: (change in output) f (a)(change in input), and this approximation gets better and better as h 0. Thus, we can view the single-variable derivative f (a) at a as the object which tells us how to go from small changes in input to corresponding changes in output, or even better: f (a) transforms infinitesimal changes in input into infinitesimal changes in output. From this point of view, f (a) is not simply a number but is better thought of as the transformation obtained via multiplication by that number. 16

17 And now we claim that the same idea works for functions f : R n R m as well. The derivative of such an f at some a R n should be something which transforms small changes in input into corresponding changes in output, or said another way transforms infinitesimal changes in input into infinitesimal changes in output. But changes in inputs in this case look like (a + h) a, which is a difference of vectors and hence is itself a vector, and similarly changes in outputs look like f(a + h) f(a), which is also a vector quantity. Thus the derivative of f at a, whatever it is, should be something which transforms (infinitesimal) input vectors into (infinitesimal) output vectors. Throwing in the point of view that derivatives should also be linear objects in some sense gives the conclusion that the derivative of f at a should be a linear transformation, i.e. a matrix! This will lead us to the following definition, where indeed the derivative of f : R n R m at a R n is not just a single number, but is rather an entire matrix of numbers. Differentiability. Suppose that f : V R m is defined on some open subset V of R n. We say that f is differentiable at a V if there exists an m n matrix B such that f(a + h) f(a) Bh lim = 0. h 0 h We will soon see that if there is a matrix with this property, there is only one and we can give an explicit description of what it has to be. We call this matrix the (total) derivative of f at a, or the Jacobian matrix of f at a. In a sense, this matrix geometrically captures the slopes of f in all possible directions. This limit precisely captures the intuition we outlined earlier. The first two terms in the numerator f(a + h) f(a) measure the change in outputs and the Bh term measures the approximate change in output corresponding to the change in input h; saying that this limit is 0 means that the approximated change in output gets closer and closer to the actual change in output as h 0. Note that the numerator has limit 0 as long as f is continuous, so the h in the denominator is there to guarantee that the overall limit is zero not just because of the continuity of f but rather as a result of of how well f is approximated by the linear transformation given by B. The single-variable case. Finally, we note that the definition of differentiability for a function f : R n R m give above becomes the usual notion of differentiability in the case n = m = 1. So, suppose that f : R R is differentiable at a R in the above sense. Then there exists a 1 1 matrix B such that f(a + h) f(a) Bh lim = 0. h 0 h Denote the single entry in B by b, so that B = ( b ) as a matrix and thus Bh = bh in the limit expression above. Then we can rewrite this expression as: ( ) f(a + h) f(a) bh f(a + h) f(a) f(a + h) f(a) 0 = lim = lim b = lim b. h 0 h h 0 h h 0 h Thus we see that the remaining limit on the right exists and equals b: f(a + h) f(a) lim = b, h 0 h 17

18 which says that f is differentiable in the sense we saw in first-quarter analysis and that f (a) = b. Hence, this new notion of differentiability really is a generalization of the previous version for single-variable functions, and the derivative in this new sense viewed as a matrix agrees with the ordinary derivative in the previous sense for single-variable functions. Important. A function f : V R m defined on an open subset V of R n is differentiable at a V if there exists an m n matrix B such that f(a + h) f(a) Bh h 0 as h 0. We call B the derivative (or Jacobian matrix) of f at a and this definition captures the idea that B transforms infinitesimal changes in input into infinitesimal changes in output. When n = m = 1, this agrees with the usual notion of a derivative for single-variable functions. We also use Df(a) to denote the Jacobian matrix of f at a. April 8, 2015: More on Differentiability Today we continued talking about differentiability of multivariable functions, focusing on examples and general properties. The main result is an explicit description in terms of partial derivatives of the Jacobian matrices which are used in the definition of differentiability. Warm-Up 1. We show that the function f : R 2 R defined by f(x, y) = x 2 + y 2 is differentiable at (0, 1) using B = ( 0 2 ) as the candidate for the Jacobian matrix. (We will see in a bit how to determine that this is the right matrix to use.) So, we must verify that for this matrix B we have: f(a + h) f(a) Bh lim = 0, h 0 h where a = (0, 1) and in Bh we think of h = (h, k) as a column vector. The numerator is: f(h, k + 1) f(0, 1) ( 0 2 ) ( ) h = h 2 + (k + 1) 2 1 2k = h 2 + k 2. k Thus: f(a + h) f(a) Bh lim = lim h 0 h (h,k) (0,0) h 2 + k 2 h 2 + k 2 = as required. Thus f is differentiable at (0, 1) and Df(0, 1) = ( 0 2 ). lim h 2 + k 2 = 0 (h,k) (0,0) Warm-Up 2. We show that the function f : R 2 R 2 defined by f(x, y) = (x 2 + y 2, xy + y) is differentiable at (0, 1) using B = ( ) as the candidate for the Jacobian matrix. Setting a = (0, 1) and h = (h, k), we compute: ( ) ( ) 0 2 h f(a + h) f(a) Bh = f(h, k + 1) f(0, 1) 1 1 k = (h 2 + (k + 1) 2, h(k + 1) + k + 1) (1, 1) (2k, h + k) = (h 2 + k 2, hk). Note that when computing the product Bh we have the written the result as a row vector so that we can combine it with the f(a + h) f(a) part of the expression. 18

19 Now we claim that f(a + h) f(a) Bh (h 2 + k 2, hk) lim = lim = 0. h 0 h (h,k) (0,0) h 2 + k 2 For this we need to show that the limit of each component function h 2 + k 2 h 2 + k 2 and hk h 2 + k 2 is zero. But the limit of the first component is precisely the one we considered in the first Warm-Up, where we showed that it was indeed zero. This is no accident: the first component x 2 + y 2 of our function f in this case was the function we looked at previously, and our work here shows that a function is differentiable if and only if each component function is differentiable. So, we can use the result of the first Warm-Up to say that the first component of f here is differentiable, so we need only consider the second component. For this, we use 2 hk h 2 + k 2 to say hk 1 h 2 + k 2 h 2 + k 2 2 h 2 + k = 1 h k 2, 2 and since this final expression goes to 0 as (h, k) (0, 0), the squeeze theorem gives lim (h,k) (0,0) hk h 2 + k 2 = 0 as desired. We conclude that f is differentiable at (0, 1) and that Df(0, 1) = ( ). Important. A function f : V R m written in components as f = (f 1,..., f m ) is differentiable at a V R n if and only if each component function f i : V R is differentiable at a V. Differentiability implies continuity. And now we verify something we would hope is true: differentiability implies continuity. Recall that existence of all partial derivatives at a point alone is not enough to guarantee continuity at that point, but the stronger version of differentiability we ve given will make this work. This will also be our first instance of using the norm of a linear transformation to bound expressions involving multivariable derivatives. Suppose that f : V R m is differentiable at a V R n. We must show that lim f(a + h) = f(a). h 0 Since f is differentiable at a there exists an m n matrix B such that Then there exists δ > 0 such that f(a + h) f(a) Bh lim = 0. h 0 h f(a + h) f(a) Bh h < 1 when 0 < h < δ. Thus for such h we have f(a + h) f(a) Bh h, which gives f(a + h) f(a) h + Bh 19

20 after using the reverse triangle inequality f(a + h) f(a) Bh f(a + h) f(a) Bh. Since Bh B h where B denotes the norm of the linear transformation induced by B, we get f(a + h) f(a) (1 + B ) h, which goes to 0 as h 0. Thus f(a + h) f(a) 0, so f(a + h) f(a) as h 0 as was to be shown. Hence f is continuous at a. Description of Jacobian matrices. The definition of differentiable at a point requires the existence of a certain matrix, but it turns out that we can determine which matrix this has to be. In particular, there can only be one matrix B satisfying the requirement that f(a + h) f(a) Bh lim = 0 h 0 h and its entries have to consist of the various partial derivatives of f. Thus, when determining whether or not a function is differentiable, we first find this matrix which requires that the partial derivatives of our function all exist and then try to compute the limit above where B is this matrix. To be precise, suppose that f : V R m is differentiable at a V R n. We claim that then all partial derivatives of f at a exist, and moreover that Df(a) is the matrix whose ij-th entry is the value of f i x j (a) where f i is the i-th component of f. The proof is in the book, but we ll include it here anyway. Since f is differentiable at a we know that there is an m n matrix B such that f(a + h) f(a) Bh lim = 0. h 0 h Since this limit exists, we should get the same limit no matter the direction from which we approach 0. Approaching along points of the form h = he i where e i is the standard basis vector with 1 in the i-th entry and zeroes elsewhere so we are approaching 0 along the x i -axis we have: f(a + he i ) f(a) B(he i ) lim = 0 h 0 h where in the denominator we use the fact that he i = h. For h > 0 we have h = h so f(a + he i ) f(a) B(he i ) h = f(a + he i) f(a) h hbe i h = f(a + he i) f(a) h Be i, while for h < 0 we have h = h so f(a + he i ) f(a) B(he i ) h = f(a + he i) f(a) h hbe i h = f(a + he i) f(a) h + Be i,. Thus since we get that f(a + he i ) f(a) B(he i ) lim = 0 h 0 h f(a + he i ) f(a) lim = Be i. h 0 h 20

21 The left side is the definition of the partial derivative f x i (a) = ( f 1 x i (a),..., fm x i (a)), so it exists, and the right side is precisely the i-th column of B. Thus the i-th column of B is ( f 1 x i (a),..., fm x i (a)) written as a column vector so f 1 f x 1 (a) 1 x n (a) B =..... as claimed. f m x 1 (a) f m x n (a) Example. Define f : R 2 R by x 3 +y 3 (x, y) (0, 0) f(x, y) = x 2 +y 2 0 (x, y) = (0, 0). We claim that f is differentiable at (0, 0). Since the matrix needed in the definition of differentiability must be the one consisting of the partial derivatives of f at (0, 0), we must first determine the values of these partials. We have: and Thus Df(0) = ( 0 f(h, 0) f(0, 0) h 3 f x (0, 0) = lim = lim h 0 h h 0 h h = 0 f(0, h) f(0, 0) h 3 f y (0, 0) = lim = lim h 0 h h 0 h h = 0. 0 ). Now we must show that Setting h = (h, k), the numerator is Thus f(0 + h) f(0) Df(0)h lim = 0. h 0 h f(h, k) f(0, 0) ( 0 f(0 + h) f(0) Df(0)h h 0 ) ( ) h = h3 + k 3 k h 2 + k. 2 = h3 + k 3 h 2 + k 2, and converting to polar coordinates (h, k) = (r cos θ, r sin θ) gives h 3 + k 3 h 2 + k 2 = r(cos3 θ + sin 3 θ) which goes to 0 as r 0 since the cos 3 θ + sin 3 θ part is bounded. Hence we conclude that f is differentiable at (0, 0) as claimed. Important. To check if a function f is differentiable at a, we first compute all partial derivatives of f at a; if at least one of these does not exist then f is not differentiable at a. If they all exist, we form the matrix Df(a) having these as entries and then check whether or not the limit f(a + h) f(a) Df(a)h lim h 0 h is zero. If it is zero, then f is differentiable at a and hence also continuous at a. 21

22 April 10, 2015: Yet More on Derivatives Yet another day talking about derivatives! Today we showed that having continuous partial derivatives implies being differentiable, which often times gives a quick way to show differentiability without having to compute a limit. Take note, however, that having non-continuous partial derivatives does NOT imply non-differentiability. We also mentioned properties of higher-dimensional differentiable functions related to sums and products, which are analogous to properties we saw for single-variable functions. Warm-Up 1. We show that the function { 2x 3 +3y 4 (x, y) (0, 0) f(x, y) = x 2 +y 2 0 (x, y) = (0, 0) is not differentiable at (0, 0). We computed the partial derivatives f x (0, 0) and f y (0, 0) in a previous Warm-Up where we found that f x (0, 0) = 2 and f y (0, 0) = 0. Thus the candidate for the Jacobian matrix is Df(0, 0) = ( 2 0 ). For this matrix we have: so f(0 + h) f(0) Df(0)h = 2h3 + 3k 4 h 2 + k h = 2hk2 + 3k 4 h 2 + k 2, f(0 + h) f(0) Df(0)h h However, taking the limit as we approach 0 along h = k gives = 2h 3 + 3h 4 lim h h h, 2 2hk 2 + 3k 4 (h 2 + k 2 ) h 2 + k 2. which does not exist since for h 0 + this limit is 1/ 2 while for h 0 it is 1/ 2. Thus this limit is not zero, so f(0 + h) f(0) Df(0)h lim h 0 h is also not zero. (In fact it does not exist.) Hence f is not differentiable at 0. Warm-Up 2. Suppose that f : R n R satisfies f(x) x 2 for all x R n. We show that f is differentiable at 0. First, note that f(0) 0 2 = 0 implies f(0) = 0. Now, we have f(te i ) te i 2 = t 2 for all i where e i is a standard basis vector, so f(0 + te i ) f(0) t = f(te i ) t t. 22

23 The right side has limit 0 as t 0, so the squeeze theorem implies that f f(0 + te i ) f(0) (0) = lim = 0. x i t 0 t Hence the candidate for the Jacobian matrix of f at 0 is the zero matrix Df(0) = 0. Thus f(0 + h) f(0) Df(0)h = f(h) 0 0 = f(h), so f(0 + h) f(0) Df(0)h h = f(h) h This has limit 0 as h 0, so the squeeze theorem again gives and hence f is differentiable at 0 as claimed. f(0 + h) f(0) Df(0)h lim = 0 h 0 h h 2 h = h. C 1 implies differentiable. The definition we have of differentiable is at times tedious to work with, due to the limit involved, but is usually all we have available. However, there is nice scenario in which we can avoid using this definition directly, namely the scenario when we re looking at a C 1 function. To be precise, suppose that all partial derivatives of f : V R m exist and are continuous at a V R n. Then the result is that f is differentiable at a. However, note that the converse is not true: if f is differentiable at a it is NOT necessarily true that the partial derivatives of f are continuous at a. In other words, just because a function has non-continuous partial derivatives at a point does not mean it is not differentiable at that point; in such cases we must resort to using the limit definition of differentiability. Indeed, the result we re stating here will really only be useful for functions whose partial derivatives are simple enough so that determining their continuity is relatively straightforward. For most examples we ve seen this is not the case, so most of the time we ll have to use the limit definition anyway, or some of the other properties we ll soon mention. This result is proved in the book, but the proof can be a little hard to follow. So here we ll give the proof only in the n = 2, m = 1 case when V R 2, which is enough to illustrate the general procedure. The main idea, as in the proof of Clairaut s Theorem, is to rewrite the expression we want to take the limit of in the definition of differentiability by applying the single-variable Mean Value Theorem one coordinate at a time. Proof. Suppose that the partial derivatives of f : V R at (a, b) V R 2 both exist and are continuous. For h, k small enough so that (a + h, b + k), (a, b + k), and (a + h, k) also lie in V, write f(a + h, b + k) f(a, b) as f(a + h, b + k) f(a, b + k) + f(a, b + k) f(a, b) by subtracting and adding f(a, b + k). The first two terms have the same second input, so applying the Mean Value Theorem in the x-direction gives f(a + h, b + k) f(a, b + k) = f x (c, b + k)h for some c between a and a + h. The second two terms in the previous expression have the same first input, so applying the Mean Value Theorem in the y-direction gives f(a, b + k) f(a, b) = f y (a, d)k 23

24 for some d between b and b + k. All together we thus have f(a + h, b + k) f(a, b) = f x (c, b + k)h + f y (a, d)k = ( f x (c, b + k) f y (a, d) ) ( ) h k where at the end we have written our expression as a matrix product. This gives ( ) h f(a + h, b + k) f(a, b) Df(a, b) = ( f k x (c, b + k) f x (a, b) f y (a, d) f y (a, b) ) ( ) h k where we use Df(a, b) = ( f x (a, b) f y (a, b) ). Thus setting h = (h, k), we have ( f(a + h) f(a) Df(a)h fx (c, b + k) f x (a, b) f y (a, d) f y (a, b) ) ( h k) = ( h h k) ( f x (c, b + k) f x (a, b) f y (a, d) f y (a, b) ) ( h k) ( h k) = ( f x (c, b + k) f x (a, b) f y (a, d) f y (a, b) ) where in the second step we use Bh B h. As (h, k) (0, 0), (c, b + k) (a, b) since c is between a and a + h and (a, d) (a, b) since d is between b and b + h. Thus since f x and f y are continuous at (a, b), we have ( fx (c, b + k) f x (a, b) f y (a, d) f y (a, b) ) ( f x (a, b) f x (a, b) f y (a, b) f y (a, b) ) = ( 0 0 ), so ( f x (c, b + k) f x (a, b) that f y (a, d) f y (a, b) ) 0 as h 0. The inequality above thus implies f(a + h) f(a) Df(a)h lim = 0 h 0 h by the squeeze theorem, so f is differentiable at a as claimed. Converse not true. An important word of warning: the converse of the above result is NOT true. That is, just because a function might have non-continuous partial derivatives at a point does NOT guarantee that it is not differentiable there. Indeed, the function (x 2 + y 2 ) sin 1 (x, y) (0, 0) f(x, y) = x 2 +y 2 0 (x, y) = (0, 0) is differentiable at (0, 0) even though its partial derivatives are not continuous at (0, 0), as shown in Example in the book. (Think of this function as an analog of the single-variable function f(x) = x 2 sin 1 x we saw back in first-quarter analysis, as an example of a differentiable function with non-continuous derivative.) Example. Consider the function x 3 +y 3 (x, y) (0, 0) f(x, y) = x 2 +y 2 0 (x, y) = (0, 0). 24

25 Last time we showed this was differentiable at the origin by explicitly working out the Jacobian matrix and verifying the limit definition, but now we note that we can also conclude differentiability by verifying that the partial derivatives of f are both continuous at (0, 0). The partial derivative of f with respect to x is explicitly given by 3x 2 x 2 +y 2 x x3 +y 3 x 2 +y 2 f x (x, y) = = 3x2 (x 2 +y 2 ) x(x 3 +y 3 ) (x, y) (0, 0) x 2 +y 2 (x 2 +y 2 ) x 2 +y 2 0 (x, y) = (0, 0). Converting to polar coordinates x = r cos θ and y = r sin θ, we have f x (x, y) = 3r4 cos 2 θ r 4 cos θ(cos 3 θ + sin 3 θ) = r(some bounded expression), r 3 which goes to 0 as r 0. Thus f x (x, y) 0 = f x (0, 0) as (x, y) (0, 0), so f x is continuous at (0, 0) as claimed. The justification that f y is continuous at (0, 0) involves the same computation only with the roles of x and y reserved. Since the partial derivatives of f are both continuous at (0, 0), f is differentiable at (0, 0). Important. If a function has continuous partial derivatives at a point, then it is automatically differentiable at that point. HOWEVER, a function can still be differentiable at a point even if its partial derivatives are not continuous there. More properties. Finally, we list some properties which allow us to construct new differentiable functions out of old ones, analogous to properties we saw in first-quarter analysis for single-variable functions. Suppose that f, g : V R m are both differentiable at a V R n. Then: f + g is differentiable at a and D(f + g)(a) = Df(a) + Dg(a), cf is differentiable at a and D(cf)(a) = cdf(a) for c R, f g is differentiable at a and D(f g)(a) = f(a)dg(a) + g(a)df(a). The first two say that the derivative of a sum is the sum of derivatives and constants can be pulled out of derivatives respectively. The third is a version of the product rule and requires some explanation. The function f g : V R in question is the dot product of f and g defined by (f g)(x) = f(x) g(x). The Jacobian matrix of this dot product is obtained via the product rule -like expression given in the third property, where the right side of that expression consists of matrix products: f(a) is an element of R m and so is a 1 m matrix, and Dg(a) is an m n matrix so the f(a)dg(a) is the ordinary matrix product of these and results in a 1 n matrix; similarly g(a)df(a) is also a matrix product which gives a 1 n matrix, so the entire right hand side of that expression is a row vector in R n, just as the Jacobian matrix of the map f g : V R n R should be. There is also a version of the quotient rule for Jacobian matrices of functions R n R, and a version of the product rule for the cross product of functions f, g : R n R m defined by (f g)(x) = f(x) g(x). These are given in some of the exercises in the book, but we won t really use them much, if at all. The main point for us is that these all together give yet more ways of justifying that various functions are differentiable. Important. Sums, scalar multiples, and (appropriately defined) products of differentiable functions are differentiable, and the Jacobian matrices of sums, scalar multiples, and products obey similar differentiation rules as do single-variable derivatives. 25

26 April 13, 2015: The Chain Rule Today we spoke about the chain rule for higher-dimensional derivatives phrased in terms of Jacobian matrices. This is the most general version of the chain rule there is and subsumes all versions you might have seen in a multivariable calculus course. Warm-Up. Define f : R 4 R 4 by f(x, y, z, w) = (x 2 + yz, xy + yw, xz + wz, zy + w 2 ). We claim that f is differentiable everywhere. Indeed, one way to justify this is to note that all the partial derivatives of f are polynomial expressions, and so are continuous everywhere. Or, we can say that each component of f is made up by taking sums and products of differentiable functions and so are each differentiable. The point is that we can justify that f is differentiable without any hard work using other tools we ve built up. So that s it for the Warm-Up, but see below for why we looked at this example; in particular, this function isn t just some random function I came up with, but has some actual meaning. Derivatives of matrix expressions. All examples we ve seen of differentiable functions in higher dimensions were thought up of in order to illustrate how to use a certain definition or property, but aren t truly representative of all types of functions you might seen in actual applications. So, here we see how to apply the material we ve been developing to a more interesting and relevant example, motivated by the fact that higher-dimensional functions expressed in terms of matrices tend to turn up quite a bit in concrete applications. This is a bit of a tangent and isn t something we ll come back to, but hopefully it illustrates some nice ideas. Let M 2 (R) denote the space of 2 2 matrices and let f : M 2 (R) M 2 (R) denote the squaring function defined by f(x) = X 2 for X M 2 (R). Here X 2 denotes the usual matrix product XX. We want to make sense of what it means for f to be differentiable and what the derivative of f should be. (The idea is that we want to differentiate the matrix expression X 2 with respect to the matrix X.) Going by what we know about the analogous function g(x) = x 2 on R we might guess that the derivative of f should be f (X) = 2X, which is not quite right but understanding what this derivative actually is really serves to illustrate what differentiability means in higher dimensions. First we note that we can identity M 2 (R) with R 4 by associating to a 2 2 matrix the vector in R 4 whose coordinates are the entries in that matrix: ( ) x y (x, y, z, w). z w So, under this identification, the squaring map we re looking at should really be thought us as a function f : R 4 R 4. Concretely, since ( ) 2 ( ) ( ) x y x y x y = = z w z w z w the squaring function f : R 4 R 4 is given by: ( x 2 ) + yz xy + yw xz + wz yz + w 2, f(x, y, z, w) = (x 2 + yz, xy + yw, xz + wz, yz + w 2 ), 26

27 which lo-and-behold is the function we looked at in the Warm-Up! Now we see that the point of that Warm-Up was to show that this squaring function was differentiable everywhere. Now, the Jacobian of this squaring function at x = (x, y, z, w) is given by: Df(x) = 2x z y 0 y x + w 0 y x 0 x + w z 0 z y 2w It is not at all clear that we can interpret this derivative as 2X as we guessed above might be the case. But remember that Df(x) is meant to represent a linear transformation, in this case a linear transformation from R 4 R 4. When acting on a vector h = (h, k, l, m), this linear transformation gives: Df(x)h = 2x z y 0 y x + w 0 y x 0 x + w z 0 z y 2w. h 2xh + zk + yl k l = hy + xk + wk + ym xh + xl + wl + zm. m xk + yl + 2wm Phrasing everything in terms of matrices again, this says that the Jacobian Df(X) of the squaring function is the linear transformation M 2 (R) M 2 (R) defined by ( ) ( ) h k 2xh + zk + yl hy + xk + wk + ym H =. l m xh + xl + wl + zm xk + yl + 2wm Perhaps we can interpret our guess that f (X) = 2X as saying that this resulting linear transformation should be given by the matrix 2X in the sense that H 2XH. However, a quick computation: ( ) ( ) ( ) x y h k 2xh + 2yl 2xk + 2ym 2XH = 2 = z w l m 2zh + 2wl 2zk + 2wm shows that this is not the case. The crucial observation is that the matrix derived above when computing the linear transformation M 2 (R) M 2 (R) induced by Df(X) is precisely: ( ) ( ) ( ) ( ) ( ) 2xh + zk + yl hy + xk + wk + ym x y h k h k x y = +, xh + xl + wl + zm xk + yl + 2wm z w l m l m z w so that Df(X) is the linear transformation M 2 (R) M 2 (R) defined by H XH + HX. Thus, the conclusion is that the derivative f (X) of the squaring map X X 2 at X M 2 (R) is the linear transformation which sends a 2 2 matrix H to XH + HX! (Note that, in a sense, this derivative is almost like 2X = X + X in that there are two X terms which show up and are being added, only after multiplying by H on opposite sides. So, our guess wasn t totally off.) With this in mind, we can now verify that f(x) = X 2 is differentiable at any X directly using the definition of differentiability, only phrased in terms of matrices: f(x + H) f(x) Df(X)H lim H 0 H (X + H) 2 X 2 (XH + HX) H 2 = lim = lim H 0 H H 0 H = 0. 27

28 Note that (X + H) 2 = X 2 + XH + HX + H 2, so since XH HX in general we indeed need to have XH + HX in the Jacobian expression in order to have the numerator above simplify to H 2. Finally, we note that a similar computation shows that the squaring map X X 2 for n n matrices in general is differentiable everywhere and that the derivative at any X is the linear tranformation H XH + HX. In particular, for 1 1 matrices the squaring map is the usual f(x) = x 2 and the derivative acts as h xh + hx = 2xh Thus the 1 1 Jacobian matrix Df(x) representing this linear transformation is simply Df(x) = ( 2x ), which agrees with the well known fact that f (x) = 2x is true in this case. Thus the above results really do generalize what we know about f(x) = x 2. Using similar ideas you can then define differentiability and derivatives of more general functions expressed in terms of matrices. Huzzah! Second Derivatives. As one more aside, we note that we can also make sense of what the second derivative (as opposed to the second-order partial derivatives) of a function f : R n R m should mean. To keep notation cleaner, we ll only look at the f : R 2 R case. As we ve seen, the derivative of f at (x, y) is now interpreted as the 1 2 Jacobian matrix Df(x) = ( f x (x) f y (x) ). We view this now as a function Df : R 2 R 2 which assigns to any x R 2 the vector in R 2 given by this Jacobian matrix: Df : x Df(x). To say that f is twice-differentiable should mean that this first derivative map Df : R 2 R 2 is itself differentiable, in which case the second derivative of f should be given by the Jacobian matrix of Df, which we think of as the derivative of the derivative of f : ( ) D 2 fxx (x) f f(x) := D(Df)(x) = xy (x), f yx (x) f yy (x) where these entries are found by taking the partial derivatives of the components of the function Df : R 2 R 2 defined above. This resulting matrix is more commonly known as the Hessian of f at x and is denoted by Hf(x); it literally does play the role of the second derivative of f in this higher-dimensional setting. And so on, you can keep going and define higher orders of differentiability in analogous ways. Chain Rule. The single-variable chain rule says that derivatives of compositions are given by products of derivatives, and now we see that the same is true in the higher-dimensional setting. To be clear, suppose that g : R n R m and f : R m R p are functions with g differentiable at a R n and f differentiable at g(a) R m. (Of course, these functions could be defined on smaller open domains the only requirement is that the the image of g is contained in the domain of f so that the composition f g makes sense.) The claim is that f g is then differentiable at a as well and the Jacobian matrix of the composition is given by: D(f g)(a) = Df(g(a))Dg(a), where the right sides denote the ordinary product of the p m matrix Df(g(a)) with the m n matrix Dg(a). Thus the derivative of f g is indeed the product of the derivatives of f and of g. 28

Note that in the case n = m = p = 1, all of the matrices involved are 1 1 matrices with ordinary derivatives as their entries, and the expression D(f g)(a) =

The chain rule is actually pretty intuitive from the point of view that derivatives are meant to measure how a small change in inputs into a function transforms into a

Intuitively, we have: But the outputs of g can then be fed in as inputs of f so: and putting it all together gives Thus the product Df(g(a))Dg(a) tells us how to

as well so it makes sense that we should have D(f g)(a) = Df(g(a))Dg(a) as the chain rule claims. Deriving other chain rules.

29 Note that in the case n = m = p = 1, all of the matrices involved are 1 1 matrices with ordinary derivatives as their entries, and the expression D(f g)(a) = Df(g(a))Dg(a) becomes which is the usual single-variable chain rule. (f g) (a) = f (g(a))g (a), Intuition behind the chain rule. The chain rule is actually pretty intuitive from the point of view that derivatives are meant to measure how a small change in inputs into a function transforms into a small change in outputs. Intuitively, we have: But the outputs of g can then be fed in as inputs of f so: and putting it all together gives Thus the product Df(g(a))Dg(a) tells us how to transform an infinitesimal change in input into f g into an infinitesimal change in output of f g, but this is precisely what the Jacobian matrix of f g at a should do as well so it makes sense that we should have D(f g)(a) = Df(g(a))Dg(a) as the chain rule claims. Deriving other chain rules. Note also that this statement of the chain rule encodes within it all other versions you would have seen in a multivariable calculus course. To be specific, suppose that f : R 2 R is a function of two variables f(x, y) and that each of x = x(s, t) and y = y(s, t) themselves depend on some other variables s and t. In a multivariable calculus course you would have seen expressions such as: f t = f x x t + f y y t. 29

30 We can view the fact that x and y depend on s and t as defining a map g : R 2 tor 2 : g(s, t) = (x(s, t), y(s, t)). Then according to the chain rule, the composition f g has Jacobian matrix given by: D(f g) = Df Dg = ( ) ( ) x f x f s x t y = ( ) f y s y x x s + f y y s f x x t + f y y t. t The second entry of this resulting 1 2 Jacobian should be f t, so we get f t = f x x t + f y y t, or f t = f x x t + f y y t as the well-known formula above claims. In a similar way, all other types of expressions you get when differentiating with respect to the variables which themselves depend on other variables (which themselves possibly depend on other variables, and so on) can be derived from this one chain rule we ve given in terms of Jacobian matrices. Important. For differentiable functions f and g for which the composition f g makes sense, the chain rule says that f g is differentiable and that its Jacobian matrices are given by D(f g)(a) = Df(g(a))Dg(a). Thus, the chain rule is essentially nothing but a statement about matrix multiplication! Proof of Chain Rule. The proof of the chain rule is in the book and is not too difficult to follow once you understand what the book is trying to do. So here we only give the basic idea and leave the full details to the book. The key idea is to rewrite the numerator in the limit f(g(a + h)) f(g(a)) Df(g(a))Dg(a)h lim h 0 h which says that f g is differentiable at a with Jacobian matrix Df(g(a))Dg(a) in a way which makes it possible to see that this limit is indeed zero. We introduce the error functions and ɛ(h) = g(a + h) g(a) Dg(a)h δ(k) = f(g(a) + k) f(g(a)) Df(g(a))k for g at a and f at g(a) respectively which measure how good/bad the linear approximations to f and g given by their first derivatives are. With this notation, note that saying g is differentiable at a and f is differentiable at g(a) translates to ɛ(h) δ(k) lim = 0 and lim h 0 h k 0 k = 0 respectively. (In other books you might see the definition of differentiability phrased in precisely this way.) After some manipulations, the book shows that the numerator we want to rewrite can be expressed in terms of these error functions as: f(g(a + h)) f(g(a)) Df(g(a))Dg(a)h = Df(g(a))ɛ(h) + δ(k). The proof is finished by showing that Df(g(a))ɛ(h) h and δ(k) h both go to 0 as h 0. Again, you can check the book for all necessary details. 30

vector-valued the statement of the Mean Value Theorem requires some modification. Geometric meaning of the chain rule.

31 April 15, 2015: Mean Value Theorem Today we spoke about the Mean Value Theorem in the higher-dimensional setting. For certain types of functions namely scalar-valued ons the situation is the same as what we saw for singlevariable functions in first-quarter analysis, but for other functions those which are vector-valued the statement of the Mean Value Theorem requires some modification. Geometric meaning of the chain rule. Before moving on, we give one more bit of intuition behind the chain rule, this time from a more geometric point of view. The interpretation of Jacobian matrices we ve seen in terms of measuring what a function does to infinitesimal changes in input does not seem to be as nice as the interpretation in the single-variable case in terms of slopes, but if we interpret slope in a better way we do get a nice geometric interpretation of Jacobians. The key is that we should no longer think of a slope as simply being a number, but rather as a vector! (We won t make this more precise, but we are considering what are called tangent vectors.) So, for a function f : R n R m we think of having an infinitesimal vector at a given point a. The behavior induced by the function f should transform this infinitesimal vector at a into an infinitesimal vector at f(a): and the point is that this infinitesimal transformation is described precisely by the Jacobian matrix Df(a)! That is, the matrix Df(a) tells us how to transform infinitesimal vectors at a into infinitesimal vectors at f(a), which is the geometric analog of the view that Df(a) transforms small changes in input into small changes in output. Now, given functions g : R n R m and f : R m R p, starting with an infinitesimal vector at a R n, after applying Dg(a) we get an infinitesimal vector at f(a), which then gets transformed by Df(g(a)) into an infinitesimal vector at f(g(a)): But the same infinitesimal vector should be obtained by applying the Jacobian of the composition f g, so we get that D(f g)(a) should equal Df(g(a))Dg(a) as the chain rules states. 31

32 The point of view that Jacobian matrices tells us how to transform infinitesimal vectors (or more precisely tangent vectors) would be further developed in a differential geometry course. Being a geometer myself, I m probably biased but I do believe that this point of view provides the best intuition behind the meaning of higher-dimensional derivatives and differentiability. Warm-Up. Suppose that F : R 2 R is differentiable at (a, b) and that F y (a, b) 0. Suppose further that f : I R is a differentiable function on some open interval I R and that F (x, f(x)) = 0 for all x I. We claim that then the derivative of f is given by the expression f (a) = F x(a, b) F y (a, b). Before proving this, here is the point of this setup and claim. Suppose that F is something like F (x, y) = x 2 y + y 3 xy + x 1. Then the equation F (x, y) = 0 defines some curve in the xy-plane: x 2 y + y 3 xy + x = 1. The idea is that we should think of this equation as implicitly defining y = f(x) as a function of x, so that this curve looks like the graph of this function f. Even if we don t know what f explicitly is, the result of this Warm-Up says that we can nonetheless compute the derivatives of f in terms of the partial derivatives of F, giving us a way to find the slope of the curve at some given point as long as F y is nonzero at that point. This is a first example of what s called the Implicit Function Theorem, which we will talk about in more generality later. Indeed, the result of this Warm-Up is the justification behind the method of implicit differentiation you might have seen in a previous calculus course. Consider the function g : I R 2 defined by g(x) = (x, f(x)) and look at the composition F g. On the one hand, this composition is constant since F (x, f(x)) = 0 for all x I, so its derivative is 0. On the other hand, the chain rule gives: D(F g)(a) = Df(g(a))Dg(a) = ( F x (a, b) F y (a, b) ) ( ) 1 f = F (a) x (a, b) + F y (a, b)f (a). Thus we must have 0 = F x (a, b) + F y (a, b)f (a), and solving for f (a) which we can do since F y (a, b) is nonzero gives the desired conclusion. Convexity. Recall that in the statement of the single-variable Mean Value Theorem, at the end we get the existence of c between some values x and a satisfying some equality. To get a higherdimensional analog of this we first need to generalize what between means in R n for n > 1. For x, a R n we define L(x; a) to be the line segment in R n between x and a, so L(x; a) consists of vectors of the form a + t(x a) for 0 t 1: L(x; a) = {a + t(x a) 0 t 1}. We think of a vector c L(x; a) as one which is between x and a. We say that a set V R n is convex if for any x, a V, the entire line segment L(x; a) lies in V. In the case of R 2, we have: 32

So, a set is convex if any point between two elements of that set is itself in that set. In the case of R, a subset is convex if and only if it is an interval. First version of Mean Value Theorem.

33 So, a set is convex if any point between two elements of that set is itself in that set. In the case of R, a subset is convex if and only if it is an interval. First version of Mean Value Theorem. For a first version of the Mean Value Theorem, we consider functions which map into R. Then the statement is the same as it is for single-variable functions, only we replace the single-variable derivative by the Jacobian derivative. To be precise, suppose that f : V R is differentiable where V R n is open and convex. Then for any x, a V there exists c L(x; a) such that f(x) f(a) = Df(c)(x a). The right side is an ordinary matrix product: since Df(c) is 1 n And x a (thinking of it as a column vector) is n 1, the product Df(c)(x a) is 1 1, just as the difference f(x) f(a) of numbers in R should be. Note that the book writes this right side instead as a dot product f(c) (x a), where f(c) = Df(c) is the (row) gradient of f and we think of x a as a row vector. We ll use the Df(c)(x a) notation to remain consistent with other notations we ve seen involving Jacobians. Proof. The proof of this Mean Value Theorem works by finding a way to convert f : R n R into a single-variable function and then using the single-variable Mean Value Theorem. Here are the details, which are also in the book. The idea is that, since at the end we want to get c being on the line segment L(x; a), we restrict our attention to inputs which come only from this line segment, in which case we can indeed interpret f after restriction as a single-variable function. Define g : [0, 1] V by g(t) = a + t(x a), so that g parameterizes the line segment in question, and consider the composition f g : [0, 1] R. Since f and g are differentiable, so is this composition and so the single-variable chain rule gives the existence of t (0, 1) such that f(g(1)) f(g(0)) = (f g) (c)(1 0) = (f g) (0). The left side is f(x) f(a) since g(1) = x and g(0) = a. By the chain rule, the right side is: (f g) (0) = Df(g(t))Dg(t) = Df(g(t)))(x a) where we use the fact that g (t) = x a for all t. Thus for c = g(t) = a + t(x a) L(x; a), we have f(x) f(a) = Df(c)(x a) as desired. Note that the convexity of V guarantees that the image of g, which is the line segment L(x; a), remains within the domain of f, which is needed in order to have the composition f g make sense. 33

34 Incorrect guess for a more general version. The above statement of the Mean Value Theorem for a function f : R n R looks to be pretty much the same as the statement f(x) f(a) = f (c)(x a) for a function f : R R, only we use the more general Jacobian derivative. This together with the fact that the equation f(x) f(a) = Df(c)(x a) also makes sense when f : R n R m might lead us to believe that the same result should be true even in this more general setting. Indeed, now Df(c) is an m n matrix and so its product with the n 1 matrix x a will give an m 1 matrix, which is the type of thing which f(x) f(a) should be. However, this more general version is NOT true, as the following example shows. Take the function f : R R 2 defined by f(x) = (cos x, sin x). If the above generalization of the Mean Value Theorem were true, there should exist c (0, 2π) such that f(2π) f(0) = Df(c)(2π 0). But f(2π) = (1, 0) = f(0) so this means that Df(c) = ( cos sin c c ) should be the zero matrix, which is not possible since there is no value of c which makes sin c and cos c simultaneously zero. Why the incorrect guess doesn t work. The issue is that, although we can apply the Mean Value Theorem to each component of f : R n R m, the point c we get in the statement might vary from one component to the next. To be clear, if f = (f 1,..., f m ) are the components of a differentiable f : R n R m, for x, a R n the Mean Value Theorem applied to f i : R n R gives the existence of c i L(x; a) such that f i (x) f i (a) = Df i (c i )(x a). Note that this gives m different points c 1,..., c m on this line segment, one for each component of f. However, to get f(x) f(a) = Df(c)(x a) we would need the same c = c 1 = c 2 =... = c m to satisfy the component equations above, and there is no way to guarantee that this will happen. In the concrete example we looked at above, we can find c 1 such that 1 1 = ( sin c 1 )(2π 0) by applying the Mean Value Theorem to the first component f 1 (x) = cos x and we can find c 2 such that 1 1 = (cos c 2 )(2π 0) by applying it to the second component f 2 (x) = sin x, but c 1 and c 2 are different. The upshot is that such a direct generalization of the Mean Value Theorem for functions R R or R n R does not work for functions R n R m when m > 1. Instead, we ll see next time that the best we can do in this higher-dimensional setting is to replace the equality being asked for by an inequality instead, which for many purposes is good enough. Important. For a differentiable function f : V R with V R n open and convex, for any x, a V there exists c L(x; a) such that f(x) f(a) = Df(c)(x a). When n = 1 and V = (a, b), this is the ordinary single-variable Mean Value Theorem. There is no direct generalization of this to functions f : V R m when m > 1 if we require a similar equality to hold. 34

35 April 17, 2015: More on Mean Value, Taylor s Theorem Today we finished talking about the Mean Value Theorem, giving the correct generalization for functions which map into higher-dimensional spaces. We also spoke about Taylor s Theorem, focusing on the second-order statement. Warm-Up. Suppose that f : V R m is differentiable on the open, convex set V R m and that there exists a V such that Df(x) = Df(a) for all x V. (So, the derivative of f is constant in the sense that the Jacobian matrix at any point is the constant matrix given by Df(a).) We show that f then has the form f(x) = Ax + b for some m n matrix A and some b R m. This is a higher-dimensional analogue of the claim that a single-variable function with constant derivative must look like f(x) = ax + b, where the coefficient a is now replaced by a constant matrix and b is replaced by a constant vector. Let f = (f 1,..., f m ) denote the components of f and fix x V. For each 1 i m, applying the Mean Value Theorem to f i : V R gives the existence of c i L(x; a) such that But now Df(c i ) = Df(a) for any i, so we get f i (x) f i (a) = Df i (c i )(x a). f i (x) f i (a) = Df i (a)(x a) for each i = 1,..., m. The 1 n row vector Df i (a) gives the i-th row of the Jacobian matrix Df(a) of f, so the right sides of the above expressions as i varies give the rows of the matrix product Df(a)(x a). The left sides f i (x) f i (a) give the rows of the difference f(x) f(a) = f 1 (x) f 1 (a). f m (x) f m (a) so all-in-all the equations above for the different f i fit together to give the equality f(x) f(a) = Df(a)(x a). Thus f(x) = Df(a)x + (f(a) Df(a)a), so setting A = Df(a) and b = f(a) Df(a)a) gives f(x) = Ax + b as the required form of f. Second version of Mean Value Theorem. As we went through last time, a direct generalization of the Mean Value Theorem phrased as an equality doesn t work when f maps into R m with m > 1, the issue being that the c i s we get when applying the Mean Value Theorem component-wise might be different. Note that in the Warm-Up above we were able to get around this since the function there satisfied Df(x) = Df(a) for all x, so that the different Df i (c i ) s could be replaced using Jacobians evaluated at the same a throughout. Instead, for a function mapping into a higher-dimensional R m we have the following result, which we still view as a type of Mean Value Theorem. (This is sometimes called the Mean Value Inequality.) Suppose that f : V R m is continuously differentiable (i.e. C 1 ) on the open subset V R n. (Note the new requirement that the partial derivatives of f be continuous on V.) Then for any compact and convex K V there exists M > 0 such that, f(x) f(a) M x a for all x, a K. 35

36 So, in the most general higher-dimensional setting the Mean Value Theorem only gives an inequality instead of an equality, but for many purposes we ll see this is good enough. Also, the constant M can be explicitly described as a certain supremum, which we ll elaborate on below; this explicit description is just as important as the fact that such a bound exists, and is where the requirements that f have continuous partial derivatives and that K is compact come in. Idea behind the proof. The Mean Value Inequality is Corollary in the book, and the proof is given there. Here we only give some insights into the proof and leave the full details to the book. To start with, the bound M is explicitly given by M := sup Df(x). x K In order to see that this quantity is defined, note that the continuity of the partial derivatives of f imply that the function x Df(x) mapping x to the Jacobian matrix of f at x is continuous when we interpret the m n matrix Df(x) as being a vector in R mn. The matrix norm map Df(x) Df(x) is also continuous, so the composition K R defined by x Df(x) is continuous as well. Since K is compact, the Extreme Value Theorem implies that this composition has a maximum value, which is the supremum M defined above. (So, this supremum is actually a maximum, meaning that M = Df(c) for some c K.) The book s proof then works by turning f : V R m into a function which maps into R alone by composing with the function g : R m R defined by g(y) = (f(x) f(a)) y, and then applying the equality-version of the Mean Value Theorem to this composition g f : V R and using the chain rule to compute D(g f). (Actually, the book uses what it calls Theorem 11.32, which we didn t talk about but makes precise the idea mentioned above of turning a function mapping into R m into one which maps into R.) Various manipulations and inequalities then give the required statement, and again you can check the book for the details. To give an alternate idea, we start with the point at which we got stuck before when trying to generalize the Mean Value Theorem as an equality directly, namely the fact that after applying the Mean Value Theorem component-wise we get f i (x) f i (a) = Df i (c i )(x a) for various c 1,..., c m. These equalities for i = 1,..., m then give the equality Df 1 (c 1 ) f(x) f(a) =. (x a) Df m (c m ) where the Df i (c i ) are row vectors. (Note again that we cannot express this resulting matrix as the Jacobian matrix at a single point unless the c i s were all the same.) Taking norms and using Ay A y gives: Df 1 (c 1 ) f(x) f(a). x a Df m (c m ) Thus if we can bound the norm of the matrix in question we would be done. 36

37 The trouble is that, because the rows are evaluated at different c i s it is not at all clear that this entire norm is indeed bounded by the M defined above since the that M depended on Jacbobian matrices in which all rows Df i (x) are evaluated at the same x. Still, this suggests that is it not beyond the realm of reason to believe that the norm we end up with here should be bounded, in particular since, for the same reasons as before, the compactness of K and continuity of the partials of f imply that the norm of each row here attains a supremum value. Turning this into an argument that the given M works as a bound requires some linear algebra which we ll skip, but provides an alternate approach than the book s proof. Important. For f : V R m continuously differentiable with V R n, for any convex and compact K V we have f(x) f(a) M x a for all x, a K where M = sup x K Df(x). Moreover, there exists c K such that M = Df(c). Example. An easy consequence of the Mean Value Theorem (inequality version) is that any C 1 function on a compact and convex set is uniformly continuous. This is something we already know to be true since continuous functions on compact sets are always uniformly continuous, but the point is that we can give a proof in this special case without making use of this more general fact. So, suppose that f : V R m is C 1 and that K V is compact and convex. By the Mean Value Theorem there exists M > 0 such that f(x) f(a) M x a for all x, a K. Let ɛ > 0. Then for δ = ɛ M > 0 we have that if x y < δ for some x, y K, then f(x) f(y) M x y < Mδ = ɛ, showing that f is uniformly continuous on K as claimed. Taylor s Theorem. We finish with a version of Taylor s Theorem in the multivariable setting. Recall that the single-variable Taylor s Theorem from first-quarter analysis is a generalization of the single-variable Mean Value Theorem: this latter theorem gives the existence of c such that f(x) = f(a) + f (c)(x a), and (the second order) Taylor s Theorem gives the existence of c satisfying f(x) = f(a) + f (a)(x a) f (c)(x a) 2. Of course, Taylor s Theorem gives higher-order statements as well, but the second-order statement is the only one we ll give explicitly in the multivariable setting. Now we move to the multivariable setting. If f : V R is differentiable where V R n is open and convex, the Mean Value Theorem gives c L(x; a) such that f(x) = f(a) + Df(c)(x a). In the case n = 2, with x = (x, y), a = (a, b), and c = (c, d) this equation explicitly looks like f(x, y) = f(a, b) + f x (c, d)(x a) + f y (c, d)(y b). 37

38 If now f has second-order partial derivatives on V, (the second-order) Taylor s Theorem gives the existence of c L(x; a) such that f(x) = f(a) + Df(a)(x a) (x a)t Hf(c)(x a), where Hf denotes the Hessian (i.e. second derivative) of f. Note that, as in the single-variable case, it is only the second derivative term Hf(c) which is evaluated at the between point c. To make the notation clear, in the final term we are thinking of x a as a column vector, so the transpose (x a) T denotes the corresponding row vector. Then (x a) T Hf(c)(x a) denotes the usual matrix product of a 1 n vector, an n n matrix, and an n 1 vector, which gives a 1 1 matrix in the end. In the n = 2 case, this product explicitly gives: (x a) T Hf(c)(x a) = ( x a y b ) ( f xx (c, d) f xy (c, d) f yx (c, d) f yy (c, d) ) ( ) x a y b = f xx (c, d)(x a) 2 + f xy (c, d)(x a)(y b) + f yx (c, d)(y b)(x a) + f yy (c, d)(y b) 2, which is analogous to the f (c)(x a) 2 term in the single-variable version: we have one term for each possible second-order partial derivative, and each is multiplied by an (x a) and/or (y b) depending on which two variables the second-order partial derivative is taken with respect to. (The book calls this expression the second-order total differential of f and denotes it by D (2) f.) The fact that we can more succinctly encode all of these second-order terms using the single Hessian expression is the reason why we ll only consider the second-order statement of Taylor s Theorem such a nice expression in terms of matrices is not available once we move to third-order and beyond. Important. If f : V R, where V R n is open and convex, has second-order partial derivatives on V, then for any x, a V there exists c L(x; a) such that f(x) = f(a) + Df(a)(x a) (x a)t Hf(c)(x a). Setting x = a + h for some h, we can also write this as f(a + h) = f(a) + Df(a)h ht Hf(c)h. This gives a way to estimate the error which arises when approximating f(x) using the linear approximation f(x) f(a) + Df(a)(x a). April 20, 2015: Inverse Function Theorem Today we spoke about the Inverse Function Theorem, which is one of our final big two theorems concerning differentiable functions. This result is truly a cornerstone of modern analysis as it leads to numerous other facts and techniques; in particular, it is the crucial point behind the Implicit Function Theorem, which is our other big remaining theorem. These two theorems give us ways to show that equations having solutions using only Jacobians. Warm-Up 1. Suppose that f : R n R n is C 1 and that there exists 0 < C < 1 such that Df(x) C for all x B r (a) 38

39 for some ball B r (a). (In class we only did this for a ball centered at 0 and for C = 1 2, but this general version is no more difficult.) We claim that f then has a unique fixed point on the closed ball B r (a), meaning that there exists a unique x B r (a) such that f(x) = x. The idea is that the given condition will imply that f is a contraction on this closed ball, and so the Contraction Mapping Principle we saw at the end of last quarter will apply. First we show that the given bound also applies when x B r (a), which will then tells us that it applies for all x B r (a). Since f is continuously differentiable, the Jacobian map x Df(x) is continuous, so Df(x) Df(a) as x a. The operation of taking the norm of a matrix is also continuous, so Df(x) Df(a) as x a. Thus if (x n ) is a sequence of points in B r (a) which converges to x B r (a), taking limits in Df(x n ) C gives Df(x) C as claimed. Hence Df(x) C for all x B r (a), so the supremum of such norms is also less than or equal to C. Now, since B r (a) is compact (it is closed and bounded) and convex, the Mean Value Theorem implies that f(x) f(y) C x y for all x, y B r (a). Since 0 < C < 1, this says that f is a contraction on B r (a). Since B r (a) is closed in the complete space R n, it is complete itself and so the Contraction Mapping Principle tells us that f has a unique fixed point in B r (a) as claimed. Warm-Up 2. We show that for h and k small enough, the values of ( π ) ( π ) cos 4 + h sin 4 + k and h k agree to 3 decimal places. Thus the expression on the right will provide a fairly accurate approximation to the expression on the left for such h, k. Define f : R 2 R by f(x, y) = cos x sin y. For a given h = (h, k) and a = ( π 4, π 4 ), by the second-order statement of Taylor s Theorem there exists c = (c, d) L(a; a + h) such that f(a + h) = f(a) + Df(a)h ht Hf(c)h. The term on the left is cos( π 4 + h) sin( π 4 + k). We compute: so Df(x) = ( sin x sin y cos x cos y ) and Df(a) = ( 1 2 f(a) + Df(a)h = h k. 1 2), Thus cos and hence ( π ) ( π ) ( h sin 4 + k h + 1 ) 2 k = f(a + h) [f(a) + Df(a)h] = 1 2 ht Hf(c)h, Now, we compute: ( π ) ( π ) ( 1 cos 4 + h sin 4 + k h + 1 ) 1 h T Hf(c) h. k 2 ( ) cos c sin d sin c cos d Hf(c) =. sin c cos d cos c sin d 39

40 Note that each of the entries in this matrix has absolute value smaller or equal to than 1. We claim that Hf(c) 2 2. Indeed, denoting this matrix by ( p r q s ) to keep notation simpler, we have for x = (x, y) of norm 1: ( ) ( Hf(c)x = p q x r s y) ( = px + qy rx + sy) = px + qy 2 + rx + sy 2 ( p x + q y ) 2 + ( r x + s y ) 2 (1 + 1) 2 + (1 + 1) 2 = 2 2 where we use the fact that x, y 1 since x has norm 1. Thus the supremum of the values Hf(c)x as x varies over vectors of norm 1 is also bounded by 2 2, and this supremum is the definition of Hf(c). Hence we have: ( π ) ( π ) ( 1 cos 4 + h sin 4 + k h + 1 ) 2 k 1 h T Hf(c) h 2 h 2, 2 and so as long as we have h ( π ) ( π ) ( 1 cos 4 + h sin 4 + k h + 1 ) 2 k < , which implies that cos( π 4 + h) sin( π 4 + k) and h + 1 2k agree to 3 decimal places as required. Inverse Function Theorem. The Inverse Function Theorem is a result which derives local information about a function near a point from infinitesimal information about the function at that point. In the broadest sense, it gives a simple condition involving derivatives which guarantees that certain equations have solutions. Consider first the single-variable version: if f : (c, d) R is differentiable and f (c) 0 for some a (c, d), then there exist intervals around a and f(a) on which f is invertible and where f 1 is differentiable and (f 1 ) (b) = 1 f (a), where b = f(a). The key parts to this are that f 1 exists near a and f(a) and that it is differentiable the expression for the derivative of f 1 (which says that the derivative of the inverse is the inverse of the derivative ) comes from the chain rule applied to f 1 (f(x)) = x. This makes some intuitive sense geometrically: if f (a) 0, it is either positive or negative, and so roughly f should look something like: 40

41 near a. Thus, near a it does appear that f is invertible (if f sends x to y, the inverse f 1 sends y to x), and at least in this picture the inverse also appears to be differentiable. Of course, it takes effort to make all this intuition precise. Here is then the multivariable analog: Suppose that f : V R n is C 1 on an open V R n and that Df(a) is invertible at some a V. (This invertibility condition is equivalent to det Df(a) 0, which is how the book states it.) Then there exists an open set W V containing a such that f(w ) is open in R n, f : W f(w ) is invertible, and f 1 : f(w ) W is C 1 with D(f 1 )(b) = [Df(a)] 1 where b = f(a). The key parts again are that f 1 exists near a and f(a) and that it is continuously differentiable. The fact that the Jacobian matrix of the inverse of f is the inverse of Jacobian matrix of f follows from the chain rule as in the single-variable case. Often times we won t make the open set W explicit and will simply say that f is locally invertible at a or f is invertible near a. Intuition and proof. Thus, the Inverse Function Theorem says that if f is infinitesimally invertible at a point it is actually invertible near that point with a differentiable inverse. To give some idea as to why we might expect this to be true, recall that near a the function f is supposed to be well-approximated by Df(a) in the sense that: f(x) f(a) + Df(a)(x a) for x near a. If Df(a) is invertible, the function on the right is invertible (it is a sum of an invertible linear transformation with a constant vector) and so this approximation suggests that f should be invertible near a as well. Moreover, the function on the right has an inverse which looks like c + [Df(a)] 1 y where y is the variable and c is a constant vector, and such an inverse is itself differentiable with Jacobian matrix [Df(a)] 1. We might expect that this inverse approximates f 1 well, so we would guess that f 1 is also differentiable with Jacobian matrix [Df(a)] 1. The hard work comes in actually proving all of these claims, which is possibly the most difficult proof we ve come across so far among all quarters of analysis. Indeed, in the book this proof is broken up into pieces and takes multiple pages to go through. Going through all the details isn t so important for us, so here we will only give an outline and leave the precise proof to the book. (However, we will give a proof of the first step of the outline which is different than the book s proof.) Here is the outline: First, the condition that Df(a) is invertible is used to show that f is invertible near a. This is essentially the content of Lemma in the book, which is roughly a big linear algebraic computation. I don t think that this sheds much light on what is actually going on, so we will give an alternate proof of this claim below. In particular, showing that f is invertible essentially requires that we show we can solve y = f(x) for y in terms of x, and the book s proof gives no indication as to how this might be done. (This is the sense in which I claimed before that the Inverse Function Theorem is a result about showing that certain equations have solutions.) Second, once we know that f 1 exists near a, the invertible of Df(a) is used again to show that f 1 is actually continuous near a. This is essentially the content of Lemma in the book. (Note that the book does this part before the first part.) 41

42 Finally, it is shown that f 1 is continuously differentiable and that D(f 1 ) = (Df) 1, which is done in the proof of Theorem in the book. Again, this is a tough argument to go through completely, and I wouldn t suggest you do so until you have some free time, say over the summer when you re feeling sad that Math 320 is over. But, as promised, we give an alternate proof of the first part, which helps to make it clear why y = f(x) should be solvable for y in terms x, as needed to define the inverse function f 1. The proof we give depends on properties of contractions and the result of the first Warm-Up from today. The point is that we can rephrase the existence of a solution to y = f(x) as a fixed point problem, and then the Warm-Up applies. Proof of first step of the Inverse Function Theorem. Fix y R n near f(a), with how near this must be still to be determined. Define the function g : V R n by setting g(x) = x + [Df(a)] 1 (y f(x)). Note that g(x) = x if and only if [Df(a)] 1 (y f(x)) = 0, which since [Df(a)] 1 is invertible is true if and only if y f(x). Thus x satisfies g(x) = x if and only if y = f(x), which gives the connection between solving y = f(x) for x and finding a fixed point x of g. The function g is differentiable on V since it is a sum of differentiable functions, and its Jacobian matrix is given by: Dg(x) = I + [Df(a)] 1 ](0 Df(x)) = I [Df(a)] 1 Df(x), where I denotes the n n identity matrix (which is the Jacobian matrix of the identity function x x), [Df(a)] 1 remains as is since it behaves like a constant with respect to x and the Jacobian of the y term is 0 since it does not depend on x. Writing the identity matrix as I = [Df(a)] 1 Df(a), we can rewrite the above as: Dg(x) = [Df(a)[ 1 [Df(a) Df(x)], so Dg(x) [Df(a)] 1 Df(a) Df(x). Since f is C 1, the map x Df(x) is continuous so Df(x) Df(a) as x a. Thus for x close enough to a we can make Df(a) Df(x) however small we like, so in particular we can make it small enough so that Dg(x) [Df(a)] 1 Df(a) Df(x) 1 2 for x close enough to a. This x close enough to a business is where the open set W V in the statement of the Inverse Function Theorem comes from, and hence the y s we are considering should be near f(a) in the sense that they lie in the image f(w ). Thus g is a contraction on this W, and so the Warm-Up (or rather the Warm-Up applied to some smaller closed ball B r (a) contained in W ) tells us that g has a unique fixed point x in W, which as mentioned previously is then a unique x satisfying f(x) = y. The existence of such an x says that f is onto near a, and the uniqueness of x says that f is one-to-one, so f is invertible on W with inverse given by f 1 (y) = x. 42

43 Important. If f is C 1 and Df(a) is invertible, then f is invertible near a and has a C 1 inverse. We should view this as a statement that equations of the form f(x) = y can be solved for x in terms of y in a continuously differentiable way. What s the point? The Inverse Function Theorem seems like a nice enough result in that it relates infinitesimal invertibility to actual invertibility, but nonetheless it s fair to ask why we care about this. Although the Inverse Function Theorem has many applications we won t talk about, the point for us is that one of the main applications is to derive the so-called Implicit Function Theorem, which is truly remarkable. We ll spend the next few lectures talking about this Implicit Function Theorem and its applications, and I hope you ll be convinced at the end that it and hence the Inverse Function Theorem really are amazing in that they tell us that things exist without having to know what those things actually are. April 22, 2015: Implicit Function Theorem Today we started talking about the Implicit Function Theorem, which is our final topic in the differentiability portion of the course. As mentioned last time, this theorem is a consequence of the Inverse Function Theorem and gives a way to show that equations have differentiable solutions without having to explicitly derive them. We will continue talking about this next time, where we ll see some interesting applications. Warm-Up. Suppose that A is an n n invertible matrix and that f : R n R n is defined by f(x) = Ax + g(x) where g : R n R n is a C 1 function such that g(x) M x 2 for some M > 0 and all x R n. We show that f is locally invertible near 0 R n, meaning that there exists an open set containing 0 such that f is invertible on this open set. The intuition is that the condition on g says roughly that g is negligible as x 0, so that the Ax term (with A invertible) should dominate the behavior of f. First note that g(0) M 0 2 = 0 implies that g(0) = 0. Moreover, so g(h) h M h 2 = M h 0 as h 0, h g(h) g(0) 0h lim = 0 h 0 h where 0 denotes the zero matrix and we use the fact that g(0) = 0. This shows that Dg(0) = 0 since the zero matrix satisfies the required property of Dg(0) in the definition of differentiability at 0 for g. Since Ax and g(x) are C 1 at 0, f is as well and Df(0) = A + Dg(0) = A. Since this is invertible the Inverse Function Theorem implies that f is locally invertible near 0 as claimed. Implicit functions. Say we are given an equation of the form F (x, y) = 0 43

44 where F is a function of two variables. We want to understand conditions under which such an equation implicitly defines y as a function x. For instance, we know that x 2 + y 2 1 = 0 is the equation of the unit circle in R 2 and that we can solve for x or y to get x = ± 1 y 2 or y = ± 1 x 2 respectively. In this case, we can solve for either x or y explicitly in terms of the other variable, at least locally, meaning that such an explicit solution characterizes one variable in terms of the other over a portion of the entire unit circle: x = 1 y 2 gives the left half of the circle while the positive square root gives the right half, and y = 1 x 2 gives the bottom half while the positive square root the top half. Moreover, once we have these explicit functions we can compute derivatives such as dy dx dx (giving slopes along the unit circle) or dy. However, if F (x, y) = 0 was given by something more complicated, say: x 3 y 2 xy + y x 2 y 4 4 = 0, solving for x or y explicitly in terms of the other becomes impossible. Nonetheless, the Implicit Function Theorem guarantees that, under some mild Jacobian condition, it will be possible to implicitly solve for x or y in terms of the other, so that we can locally think of the curve F (x, y) = 0 as the graph of a single-variable function. Higher dimensional example. Consider now the system of (non-linear) equations xu 2 + yv 2 + xy = 11 xv 2 + yu 2 xy = 1 for (u, v, x, y) R 4, with one solution given by (1, 1, 2, 3). We can ask if there are any other solutions, and if so, what type of geometric object these equations characterize in R 4. We claim that it is possible to (implicitly) solve these equations for u and v in terms of x and y. Define F : R 2 R 2 R 2 (we re writing the domain R 4 as R 2 R 2 in order to separate the variables u, v we want to solve for from the variables x, y they will be expressed in terms of) by F (u, v, x, y) = (xu 2 + yv 2 + xy 11, xv 2 + yu 2 xy + 1), so that the given system of equations is the same as F (u, v, x, y) = 0. Denoting the components of F by F = (F 1, F 2 ), we use DF (u,v) to denote the partial Jacobian matrix obtained by differentiating F with respect to only u and v: ( F1 F 1 ) ( ) 2xu 2yv DF (u,v) = u F 2 u (Note that the book does NOT use this notation.) The partial Jacobian determinant is then (F 1, F 2 ) (u, v) v F 2 v = 2yu 2xv := det DF (u,v) = 4x 2 uv 4y 2 uv. Evaluating at the point (1, 1, 2, 3) which we know is a solution of the original system of equations, we get: (F 1, F 2 ) (1, 1, 2, 3) = = 20, (u, v) 44.

45 so since this is nonzero the partial Jacobian matrix DF (u,v) (1, 1, 2, 3) = ( ) is invertible. The Implicit Function Theorem (which we ll state in a bit) thus guarantees that it is possible to solve for u = u(x, y) and v = v(x, y) in terms of x and y (at least implicilty) locally near (1, 1, 2, 3), giving many points (u(x, y), v(x, y), x, y) which satisfy the equations F (u, v, x, y) = 0. Thus, even without knowing what u(x, y) and v(x, y) explicitly are, we know that the given equations have infinitely many solutions near (1, 1, 2, 3) and that near this point the equations define a 2-dimensional surface in R 4, where we know this is 2-dimensional since the resulting parametric equations u(x, y) and v(x, y) in the end depend on two parameters. In this case, explicitly solving for u and v in terms of x and y is not possible, but the point is that we can answer the questions we want without knowing these explicit solutions. Moreover, as we ll see, the Implicit Function Theorem also gives a way to explicitly compute the Jacobian matrix obtained by differentiating u(x, y) and v(x, y) with respect to x and y, so that in the end, even though we don t know u(x, y) and v(x, y), we do know their partial derivatives and the fact that they satisfy the system of equations F (u, v, x, y) = 0. Implicit Function Theorem. Here then is the statement. Suppose that F : R n R k R n is C 1 and that F (x 0, t 0 ) = 0, where x 0 R n and t 0 R k. (The x s in R n are the variables we want to solve for and the t s in R k are the variables the x s will be expressed in terms of.) If the partial Jacobian matrix DF x (x 0, t 0 ) is invertible, then there exists an open set W R k containing t 0 and a unique C 1 function g : W R n such that F (g(t), t) = 0. Moreover, the Jacobian matrix of g is given by Dg(t) = [DF x (g(t), t))] 1 DF t (g(t), t). Some remarks are in order. Thinking of F (x, t) = 0 as a system of n non-linear equations in terms of n + k variables, the goal is to solve for the n variables in x in terms of the k variables in t. Note that we, in theory, we can solve for as many variables as there are equations. The partial Jacobian matrix DF x is obtained by differentiating with respect to the variables we want to solve for, and the fact that there are as many of these as there are equations (i.e. components of F ) guarantees that DF x is a square matrix so that it makes sense to talk about it being invertible. The function g : W R n obtained has as its components the expressions for x = g(t) in terms of t we are after, which hold only locally on W near t 0. The result that F (g(t), t) = 0 says that indeed the points (x = g(t), t) satisfying the equations given by the components of F, so we get one such simultaneous solutions for each t W. Said another way, this says that locally near (x 0, t 0 ), the object defined by F (x, t) = 0 is given by the graph of g. So, the Implicit Function Theorem essentially says that if we have one solution of a system of non-linear equations and an invertibility condition on a certain partial Jacobian, we can solve for some variables in terms of the others. In the higher-dimensional example given above, the function g is the one defined by g(x, y) = (u(x, y), v(x, y)) where u(x, y) and v(x, y) are the implicit expressions for u and v in terms of x and y. We will look at the idea behind the proof (which depends on the Inverse Function Theorem) next time. Note that the statement of this theorem in the book does not include the part about the explicit expression for the Jacobian of g. Important. Using an invertibility criteria, the Implicit Function Theorem gives a way to show that systems of equations have solutions, by showing that locally these equations characterize the graph 45

46 of a function. This helps to understand the object defined by those equations, by parameterizing solutions in terms of only some of the variables involved. Back to two-variable case. Let us see what the Implicit Function Theorem looks like in the case where F : R 2 R, so that F (x, y) = 0 describes a curve in R 2. Say F (a, b) = 0. The partial Jacobian matrix DF y (a, b) = ( F y (a, b) ) is just a 1 1 matrix, so the condition required is that F y (a, b) 0. Then the Implicit Function Theorem gives a function y = f(x) which expresses y implicitly as a function of x, so that F (x, y) = 0 is (locally) the graph of f : I R where I R is an open interval. In particular, for the point (a, b) satisfying F (x, y) = 0 we have b = f(a). The expression in this case becomes Dg(t) = [DF x (g(t), t))] 1 DF t (g(t), t) f (a) = F x(a, b) F y (a, b), allowing us to compute the slope of the graph of f (and hence of the curve F (x, y) = 0) at x = a in terms of the partial derivatives of F. (Note that this expression for f was derived back in the Warm-Up from April 15th using the chain rule.) April 24, 2015: More on Implicit Functions Today we spoke more about the Implicit Function Theorem, giving an outline of the proof and some possible applications. We also pointed out a linear algebraic analog to keep in mind which helps to get at the heart of the Implicit Function Theorem. Warm-Up. Suppose that T : R n R k R n is a linear transformation, which we write as T (x, t) = Ax + Bt where (x, t) R n R k, A is an n n matrix, and B is an n k matrix. (All together A and B form an n (n + k) matrix ( A B ), which is the matrix of T : R n+k R n. With this notation, the A part is multiplied only by the first n entries x of a vector in R n+k and the B part is multiplied only by the final k entries t.) Supposing that A is invertible and that T (x, t) = 0, we show that we can solve for x in terms of t. The partial Jacobian matrix DT x is simply A since only A involves the variables given in x. Since this is invertible and linear transformations are C 1, the Implicit Function Theorem implies that we can indeed solve for x in terms of t in the system of equations T (x, t) = 0, so we are done. Of course, using what we know about matrices, we can actually solve for x in terms of t explicitly in this case. After all, T (x, t) = 0 is and since A is invertible we have Ax + Bt = 0, or Ax = Bt, x = A 1 Bt. So actually, we didn t need the Implicit Function Theorem to solve this, only linear algebra. The point is not to view this result as a consequence of the Implicit Function Theorem as we did above, but rather to view it as motivation for the Implicit Function Theorem. The equation Ax + Bt = 0 46

47 gives a system of n linear equations in n + k unknowns (x, t), and the condition that A is invertible says that the corresponding augmented matrix ( A B ) has rank n. Solving these equations using Gaussian elimination (i.e. using row operations to reduce to echelon form) will in the end express the first n variables x in terms of the final k free variables t, an expression which is explicitly given by x = A 1 Bt. Thus, we should view the Implicit Function Theorem as a non-linear analog of this linear algebraic fact! The requirement that DF x be invertible is a type of rank or linear independence condition, and the implicit function g expresses the dependent variables x in terms of the independent or free variables t. The claimed expression for the Jacobian matrix of g: Dg(t) = [DF x (g(t), t))] 1 DF t (g(t), t) mimics the x = A 1 Bt equation obtained in the linear-algebraic case. Important. The Implicit Function Theorem is a non-linear analog of the fact that systems of linear equations with more unknowns than equations and full rank always have infinitely many solutions, which can moreover be expressed solely in terms of free variables. Idea behind the proof of Implicit Function Theorem. We will here give an idea behind the proof of the Implicit Function Theorem, only going as far as explaining where the implicit function g and its Jacobian come from. Check the book for full details and for the rest of the proof. First, the idea that the Implicit Function Theorem should be a consequence of the Inverse Function Theorem comes from the point of view that both the idea that, under suitable conditions on Jacobians, certain equations have solutions: F (x, t) = 0 in the case of the Implicit Function Theorem and y = f(x) in the case of the Inverse Function Theorem since finding f 1 amounts to solving y = f(x) for x in terms of y. However, the Inverse Function Theorem only applies to functions between spaces of the same dimension, while here in the Implicit Function Theorem we have F : R n R k R n, so we need a way of extending this function F to one between spaces of the same dimension. Define F : R n R k R n R k by F (x, t) = (F (x, t), t), so the extra dimensions required come from keeping track of t. The Jacobian matrix of F then looks like: ( ) D F DFx DF = t, 0 I where DF x and DF t are partial Jacobian n n and n k matrices respectively and come from differentiating the F (x, t) component of F, 0 in the lower left denotes the k n zero matrix and comes from differentiating the t component of F with respect to x, and I in the lower right denotes the k k identity matrix and comes from differentiating the t component with respect to t. At (x 0, t 0 ) we have thus have: ( ) D F DFx (x (x 0, t 0 ) = 0, t 0 ) DF t (x 0, t 0 ). 0 I Since the two main diagonal blocks DF x (x 0, t 0 ) and I are invertible (recall the assumptions of the Implicit Function Theorem), this entire Jacobian matrix is invertible as well. Thus by the Inverse Function Theorem, F is locally invertible near (x 0, t 0 ). The portion of the inverse of F : ( F ) 1 : (F (x, t), t) (x, t) 47

48 which sends the second component t of the input to the first component x of the output then gives the implicit function g(t) = x which is asked for in the Implicit Function Theorem. The fact that g is C 1 comes from the fact that the inverse of F is C 1, which is part of the statement of the Inverse Function Theorem. The claimed expression for Dg as Dg(t) = [DF x (g(t), t))] 1 DF t (g(t), t) can be justified in multiple ways. One way is via the chain rule, which is done in the solution to Problem 7 from the collection of practice problems for the first midterm. Another way is as follows. First consider a 2 2 matrix of the form ( ) a b, 0 1 which is a simplified version of the form for D F we have above. This has inverse ( ) 1 a b = a ( ) 1 b = 0 a ( a 1 a 1 ) b. 0 1 It turns out that the same type of expression works for the matrix we have consisting of four blocks, so that: ( (D F ) 1 [DFx ] = 1 [DF x ] 1 ) DF t. 0 I Since g is the portion of F which sends the second component of (F (x, t), t) to the first component of (x, t), Dg should come from the upper-right part of this matrix, giving the desired expression for Dg. (This approach requires knowing some linear algebraic properties of block matrices. Instead, the chain rule method described in Problem 7 of the practice problems is much more rigorous.) Approximating implicit functions. Let us comment on one aspect of the implicit function theorem which might seem troubling: the fact that the function g is usually only implicitly and not explicitly defined. Although explicitly knowing Dg and the fact that g satisfies F (g(x), x) = 0 is often times good enough, other times having some better information about g itself is necessary. While we can almost never explicitly find g, it turns out that there are fairly good ways of approximating g. The key comes from the contraction/fixed-point approach to the Inverse Function Theorem we outlined a few lectures ago. In the end, the function g we re after comes from inverting some function F, and finding F 1 (i.e. solving w = F (z) for z in terms of w) amounts to finding a fixed point of a function of the form h(z) = z + [D F (a)] 1 (w F (z)). As we saw when looking at contractions and fixed points at the end of last quarter, this fixed point is obtained as the limit of a sequence z, h(z), h(h(z)),... obtained by repeated application of h. Thus, we can approximate the implicit function g via an iterated procedure where we take some z and apply h over and over again. This description is a 48

49 bit vague, but rest assured that there do exist honest workable algorithms and techniques which turn this idea into a concrete way of approximating implicit functions. Application 1. We finish by outlining some applications of the Implicit Function Theorem. To start with, the Implicit Function Theorem lies behind the fact that any randomly drawn sufficiently smooth surface in R 3 can be described using equations. This is clear for something like a sphere which can be described using x 2 + y 2 + z 2 = R 2, and for other familiar surfaces, but is not so clear for a surface which does not come from such a recognizable shape. Nonetheless, the Implicit Function Theorem implies that such random-looking surfaces can indeed be described via equations, and in particular locally as the graphs of smooth functions. This gives rise to the idea that we can describe such surfaces parametrically, at least implicitly. The same is true for sufficiently smooth geometric objects in higher dimensions, and such applications are at the core of why computations in differential geometry work the way they do. Without the Implicit Function Theorem, we would really have no way of approaching higherdimensional geometric structures in a systematic way. Application 2. For an economic application, suppose we have some products we re producing at various quantities x = (x 1,..., x n ) we have control over. But there are also variables we have no control over, say based on prices the market determines or the amount of labor available, and so on denote these variables by t = (t 1,..., t k ). Given some relation between these different variables F (x, t) = 0 we look to be able to determine how to adjust our production in response to changes in the uncontrolled various; in other words, we d like to be able to solve for x in terms of t. Under some suitable non-degeneracy conditions the Implicit Function Theorem says that we can in fact do so. Moreover, even though we may have an explicit expression for x in terms of t, we can indeed get an explicit expression for the rate of change of x with respect to t via the Jacobian Dg of the implicit function, so that we can predict how we should adjust production in order to maximize profit given a change in the market. Application 3. The method of Lagrange multipliers is no doubt a basic technique you saw in a multivariable calculus course. The statement is that at a point a at which a function f : R n R achieves a maximum or minimum among points satisfying some constraint equations: there exists scalars λ 1,..., λ k such that g 1 (x) = 0,..., g k (x) = 0, f(a) = λ 1 g 1 (a) + + λ k g k (a). The point is that if we want to optimize f subject to the given constraints, we should first try to solve the equation f(a) = λ 1 g 1 (a) + + λ k g k (a) in order to find the candidate points a at which maximums or minimums can occur. The derivation of the equation above which must be satisfies at such a maximum or minimum depends on the Implicit Function Theorem. The idea is that saying such scalars λ i exist at a 49

50 critical point a amounts to saying that we can find (i.e. solve for) the λ i s in terms a. Applying the Implicit Function Theorem to the expression F (a, λ) = f(a) λ 1 g 1 (a) λ k g k (a) in the end gives λ = g(a), which says that such λ i s exist. Check the optional section at the end of Chapter 11 in the book for a complete derivation. Application 4. Finally, we give an application in the infinite-dimensional setting, which is quite powerful in modern mathematics. (We didn t talk about this example in class.) The Implicit Function Theorem we ve given applies to functions F : R n R k R n between finite dimensional spaces, but it turns out that there are version of this theorem for functions between infinitedimensional (even of uncountable dimension) spaces as well. We won t state this version, but will say that the book s proof of the Inverse Function Theorem does not generalize to this infinite dimensional setting since it involves solving systems of finitely many linear equations in terms of finitely many unknowns; instead, to get the infinite-dimensional version of the Inverse and hence Implicit Function Theorems you need to use the contraction/fixed-point approach, which does work in this setting since it applies to general metric spaces. A partial differential equation (abbreviated PDE) is an equation involving an unknown function f : R n R m and its partial derivatives. For instance, the equation u xx + u yy + u zz = u t for a function u(x, y, z, t) is called the heat equation and models those functions which describe the dissipation of heat in a 3-dimensional region. To solve this PDE means to find all such functions. In rare instances this is possible to do explicitly, but the behavior of such equations in general is complicated enough that explicit solutions are impossible to come by. Nonetheless, versions of the Implicit Function Theorem in infinite-dimensions guarantee that certain PDEs do have solutions and gives a way to parametrize the space of solutions. (The infinite-dimensional space being considered here is not solely made up of the variables x, but also consists of the functions being considered themselves; this is analogous to looking at metric spaces like C[a, b] from last quarter where the points in such spaces are actually functions.) Moreover, the fact that we can approximate these implicit solutions using iterations as described earlier gives methods for approximating solutions of PDEs, which in many applications is good enough. These ideas are crucial to understanding the various types of PDEs which pop-up across diverse disciplines: the Schrödinger equation in physics, the Black-Scholes equation in economics and finances, the wave equation in engineering, etc. The (infinite-dimensional) Implicit Function Theorem is the only reason we know how to work with such equations in general. Important. The Implicit Function Theorem has some truly amazing applications, most of the interesting of which go far beyond the scope of this course. April 27, 2015: Jordan Measurability Today we moved on to the integration portion of the course, starting with characterizing regions in R n which have a well-defined volume. Such regions will form the domains of integration for multivariable integrals. Moral of first four weeks. Before moving on, let us emphasize the key take-away from the first four weeks of the course: 50

51 locally, multivariable calculus is linear algebra. The idea is that basically every concept we ve seen when dealing with higher-dimensional differentiability stems from some corresponding concept in linear algebra the linear algebra describes what happens infinitesimally at a point and the calculus describes how that translates into some local property near that point. Indeed, here is a table of analogies to keep in mind, where the concept in the first column is meant to be the nonlinear analog of the linear concept in the second column: Calculus (nonlinear) Linear Algebra differentiable function matrix or linear transformation chain rule matrix multiplication gives composition of linear transformations Mean Value Equality linearity of A: Ax Ay = A(x y) Mean Value Inequality Cauchy-Schwarz: Ax Ay A x y Inverse Function Theorem invertible matrices define invertible linear transformations Implicit Function Theorem in a system of n equations in n + k unknowns of rank n, can solve for n variables in terms of other k variables In particular, the last line corresponding to the Implicit Function Theorem was the point of the Warm-Up from last time. In a nutshell, whenever you have some nice linear algebraic fact available, there is likely to be some corresponding non-linear analog in calculus. Regions of integration. Before looking at multivariable integration, we recall the single-variable integral. We consider functions f : [a, b] R satisfying the property that the upper and lower sums can be made arbitrarily close, as expressed by the inequalities U(f, P ) L(f, P ) < ɛ we saw back in the first quarter. A key point here is that we only defined integration over closed intervals [a, b]. More generally, we can extend the definition of the integral to finite unions of disjoint closed intervals, where we define, say, the integral of f over [a, b] [c, d] to be the sum of the integrals of f over [a, b] and over [c, d]: f(x) dx := f(x) dx + f(x) dx, [a,b] [c,d] [a,b] [c,d] and similarly for finite unions of more than two closed intervals. However, this is as far as we can go with the Riemann integral of the first quarter. (On the last day of the first quarter we briefly looked at what s called the Lebesgue integral, which allows us to extend single-variable integration further, but that is not what we are talking about here.) What all of these regions of integration, obtained as finite unions of disjoint closed intervals, have in common is that they all have a well-defined length indeed, the length of [a, b] [c, d], when this union is disjoint, is just the length of [a, b] plus the length of [c, d]. Thus, the upshot is that the single-variable Riemann integral is only defined over subsets of R with a well-defined length. By analogy, we might expect that a similar thing should be true for integration in higher-dimensional spaces, only replacing length by volume. (In the case of R 2, volume just means area.) This is correct, but it requires that we give a precise notion of what it means for a subset of R n to have a volume, a complication which wasn t readily apparent in the case of single-variable integration. 51

Jordan outer sums. How can we precisely measure the volume of a subset of Rn?

points (x1,..., xn ) where the i-th coordinate xi is in the closed interval [ai, bi ]. In the case of R2 this gives an ordinary rectangle.

) But we know geometrically that the volume, whatever that means, of such a box should just be obtained by multiplying the lengths of the various edges, so we define the volume R of R to be: R = (b1

52 Jordan outer sums. How can we precisely measure the volume of a subset of Rn? First, the set should definitely be bounded, which implies that we can encase it in a large enough rectangular box: R = [a1, b1 ] [a2, b2 ] [an, bn ], which denotes the region in Rn consisting of points (x1,..., xn ) where the i-th coordinate xi is in the closed interval [ai, bi ]. In the case of R2 this gives an ordinary rectangle. (Most pictures we draw from now on will take place in R2, since things are simpler to visualize there.) But we know geometrically that the volume, whatever that means, of such a box should just be obtained by multiplying the lengths of the various edges, so we define the volume R of R to be: R = (b1 a1 ) (bn an ). (Once we give a precise notion of volume we will show that this is indeed the correct volume of a rectangular box; for now, we think of R as simply being some quantity which we expect to be volume based on geometric intuition.) The idea now is that, if E R is a region which we want to define the volume of, we can use rectangular boxes which get smaller and smaller to better and better approximate the volume of S; if we get a well-defined number by doing so, then that number should be defined to be the volume of S. Here are the first definitions. A grid G on R is a partition of R obtained by partitioning (in the sense of how partition was used in the first quarter) each of the edges [ai, bi ] making up R. Each grid cuts R up into smaller rectangular boxes: Clearly, adding together the volumes Ri of all the Ri will just give the volume R of R, so to bring the region E into play we should only take those smaller rectangular boxes which intersect E: 52

We define the outer sum V (E; G) to be the sum of the volumes of all such rectangular boxes: V (E; G) = R i. R i E The idea is that such an outer sum overestimates the volume of E.

53 We define the outer sum V (E; G) to be the sum of the volumes of all such rectangular boxes: V (E; G) = R i. R i E The idea is that such an outer sum overestimates the volume of E. Now, as the grid G gets finer (meaning that we split the small rectangular boxes making up G into even smaller pieces), the outer sum also gets smaller since when splitting up some R i we might be left with pieces which no longer intersect E, meaning that they will no longer contribute to the new outer sum: We expect that as the grid gets finer and finer, the outer sums provide a better and better approximation to the volume E, and that in the limit they should approach said volume. Thus we guess that we should define the volume of E as the infimum of all such outer sums: where R is a rectangular box containing E. Vol(E) := inf{v (E; G) G is a grid on R}, Clarifying example. The above procedure almost works, only that there are sets E for which the resulting volume does not behave how we expect volume to behave. In particular, one key property we would expect is that if A and B are disjoint, then the volume of A B should be obtained by adding together the volumes of A and B separately: Vol(A B) = Vol A + Vol B. Indeed, it makes intuitive sense in R 3 at least that if we take some solid with a well-defined volume and break it up into two pieces, the sum of the volumes of those pieces should give the volume of the original solid. However, here is an examples where this equality cannot hold. Take E [0, 1] [0, 1] to be the subset of the unit square in R 2 consisting of points which rational coordinates: E = {(x, y) [0, 1] [0, 1] x, y Q}. Given any grid G on [0, 1] [0, 1], any smaller rectangle R i contains a point of E since Q 2 is dense in R 2, so every smaller rectangle of the grid will contribute to V (E; G), giving: V (E; G) = R i = R i = Vol([0, 1] [0, 1]) = 1. R i R i E 53

54 (Volume in this case just means area.) But note that we get the same result using the complement E c of E: since the irrationals are dense in R, any R i will also contain a point of E c so every such rectangle contributes to the outer sum V (E c ; G), giving: V (E c ; G) = R i R i = 1 as well. Thus the infimum of such outer sums will be 1 for both E and E c, so even though [0, 1] [0, 1] = E E c, we have: Vol(E) + Vol(E c ) = 2 1 = Vol E E c if Vol(E) and Vol(E c ) are defined as in our attempt above. Regions of zero volume. The above example shows that there are regions for which our attempted definition of volume gives something which does not necessarily behave how we expect volumes to behave. The issue in that example is that both E and E c are dense in [0, 1] [0, 1], which is what ends up giving a value of 1 for all outer sums. Last quarter we saw that this property is equivalent to the claim that E = [0, 1] [0, 1], and so the point is that in some sense the boundary of E is too large in order to allow for E to have a well-defined volume. The upshot is that volume should only be defined for regions which have as small a boundary as possible, which we will now interpret as meaning that the boundary should have volume zero. We can make this precise as follows: We say that a bounded set S R n contained in a rectangular box R has Jordan measure zero (or zero volume) if for any ɛ > 0 there exists a grid G on R such that V (S; G) < ɛ. The intuition is that we can cover S by small enough rectangles which have total area less than any small quantity like this would imply that if Vol S were defined, it would have to satisfy Vol S < ɛ for all ɛ > 0, which means that Vol S would have to be zero. For example, a continuous curve drawn in R 2 will have Jordan measure zero, since intuitively we can cover it with finitely many small rectangles of arbitrarily small total area: Note that we are not talking here about the area of the region enclosed by the curve, but rather about the area of the curve itself, which should be zero since a curve only has length but no width. Similarly, a 2-dimensional surface in R 3 should have Jordan measure zero in R 3, where we use small rectangular boxes to cover the surface. Note that a square such as [0, 1] [0, 1] is does 54

55 not have Jordan measure zero when viewed as a subset of R 2 but that it does have Jordan measure zero when viewed as a subset of R 3. The point is that this notion of measure zero is relative to the ambient space the region in question is sitting inside of. Jordan measurability. Now armed with the notion of zero volume, we can characterize the types of regions which have well-defined volumes, with the intuition being that these are the regions whose boundaries are as small as possible: We say that a bounded set E R n is Jordan measurable (or is a Jordan region) if its boundary E has Jordan measure zero. In this case, we define the volume (or Jordan measure of E to be the infimum of all outer sums obtained by varying through all possible grids covering a rectangular box R containing E: Vol(E) = inf{v (E; G) G is a grid on R}. These will be the regions over which the notion of Riemann integration will make sense in the multivariable setting. After having defined integration in this setting, we will see a better reason as to why Jordan measurable sets are indeed the right types of regions to consider. Important. Volumes in R n are computed using grid approximations, where we define V (E; G) = R i and Vol(E) = inf{v (E; G) G is a grid on R} R i E for a region E contained in a rectangular box R. This only makes sense for regions whose boundary has volume (or measure) zero, meaning that V ( E; G) can be made arbitrary small by choosing appropriate grids. A comment on the term measure. The book does not use the phrases Jordan measure zero, Jordan measurable, nor Jordan measure, instead using zero volume, Jordan region, and volume respectively. But, I think emphasizing the term measure is important since it emphasizes the connection between what we re doing and the more general subject of measure theory, which provides the most general theory of integration available. The word measure in this sense is simply meant to be a general notion of volume. To say a bit more, note that in our approach we started with outer sums which overestimate the volume we re interested in. Similarly, we can define inner sums by underestimating this volume, using rectangles which lie fully inside E as opposed to ones which can extend beyond. In this case, as grids get finer and finer, inner sums get larger and larger, so it is the supremum of the inner sums which should approach the volume we want. For a region to have a well-defined volume we should expect that the outer sum and inner sum approach give the same value, meaning that inf{outer sums} = sup{inner sums}. The infimum on the left is more generally called the outer Jordan measure of E and the supremum on the right is the inner Jordan measure, and a set should be Jordan measurable precisely when these two numbers are the same. (For the clarifying example above, the inner Jordan measure turns out to be zero since the interior of the set in question is empty.) It turns out that the outer and inner Jordan measure agree if and only if E has Jordan measure zero, which thus reproduces the definition we gave of Jordan measurability. The idea is that that volume of E should be sandwiched between the inner and outer Jordan measures since when subtracting inner sums from 55

56 outer sums you re left only with rectangular boxes which cover E. Thus the inner and outer Jordan measures agree if and only if the difference between the inner and outer sums can be made arbitrarily small, which says that the volume of E can be made arbitrarily small. We won t look at these notions of inner sums and measures in class, but the book has some optional material you can look at if interested. In all of our definitions we only consider grids which consist of finitely many rectangular boxes, but in more generality we can consider collections of countable infinitely many rectangular boxes (at least those for which the infinite sum R i R i converges) covering a given region the resulting measure obtained is what s called the Lebesgue measure of a region in R n, so what we re doing is essentially a simplified version of Lebesgue measure theory, namely the simplification where we only allow finitely many rectangular boxes. Again, the book has some optional sections which elaborate on this if interested. May 1, 2015: Riemann Integrability Today we started talking about integrability of multivariable functions, using an approach analogous to what we saw for single-variable functions back in the first quarter. For the most part we get the same properties and facts we had in the single-variable case, although there are some new things which happen in the multivariable setting which we ll elaborate on next time. Warm-Up 1. One of the main reasons we restrict the types of regions we want to consider to those whose boundaries have volume zero is the desire for Vol(A B) = Vol A + Vol B to be true when A and B are disjoint. We show that this equality holds in the simpler case where A and B are closed Jordan measurable subsets of R 2, although it is in fact true for any disjoint Jordan measurable sets. (In fact, it s true as long as A B has Jordan measure zero!) We use the following picture as a guide: (The point of requiring that A and B are closed and hence compact and disjoint is to guarantee that there is some positive distance between them.) The idea is that for fine enough grids, the small rectangles making up the grid can be separated into only those which cover A and only those which cover B, as in the picture above. First, we need to know that if A and B are Jordan measurable, A B is as well. This is done in the book, but for completeness here is the argument. Since A and B are Jordan measurable, A and B have Jordan measure zero, so for a fixed ɛ > 0 there exist grids G 1 and G 2 such that V ( A; G 1 ) < ɛ 2 and V ( B; G 2 ) < ɛ 2. 56

57 Since (A B) A B, for a grid G finer than both G 1 and G 2 we have: V ( (A B); G) V ( A; G) + V ( B; G) < ɛ 2 + ɛ 2 = ɛ, where the first inequality comes from separating out the rectangles which cover (A B) into the union of those which cover A with those which cover B. (If G 1 and G 2 are not fine enough, some rectangles might cover both A and B, which is why we can only guarantee a non-strict inequality in this step.) Thus (A B) has Jordan measure zero, so A B is Jordan measurable as claimed. Now, take a fine enough grid G where all small rectangles have diagonal length less than the distance between A and B, which guarantees that any small rectangle can only intersect A or B but not both simultaneously. Thus we can separate the rectangles R i in this grid which intersect A B into those which intersect A and those which intersect B with no overlap between these, so: V (A B; G) = R i = R i + R i = V (A; G) + V (B; G). R i (A B) R i A R i B Hence taking infimums of both sides of this equality once G is fine enough gives inf{v (A B; G)} = inf{v (A; G)} + inf{v (B; G)}, so Vol(A B) = Vol A + Vol B as claimed. (In the more general setting where A and B are not necessarily closed nor disjoint but A B still has Jordan measure zero, we would have to separate the rectangles into three categories: those which intersect A but not B, those which intersect B but not A, and those which intersect A B. In this case, the contribution to V (A B; G) coming from the third type of rectangle can be made arbitrarily small using the fact that A B has Jordan measure zero, so that when taking infimums this contribution can be ignored. There are details to fill in, but that s the basic idea for this general case.) Important. If A, B R n are Jordan measurable, then A B is Jordan measurable. If in addition A B has Jordan measure zero, then Vol(A B) = Vol A + Vol B. Banach-Tarski Paradox. As an aside, we give an example which truly illustrates the importance of the of the property given in the first Warm-Up, which seems like it should be an obvious fact. Here is the result, which is known as the Banach-Tarski Paradox: it is possible to take a solid unit ball, break it up into a finite number of pieces, and rearrange those pieces to end up with two solid unit balls! To be clear, this is a true statement and the paradox in the name only refers to the paradoxical observation that it seems as if we ve doubled the total volume we started out with through this process. Indeed, if this could be done in the real world this would say that you could turn a solid bar of gold into two solid bars of gold, each of the same size as the original one. The point is that the pieces which we break the sphere up into originally will not be Jordan measurable and so do not have a well-defined volume, thus since at some point in the process you work with sets for which volume does not make sense there is no reason to expect that the total volume you end up with at the end should be the same as the one you started with. This results shows why care has to be taken when making claims such as the one in the Warm-Up. (Here s a joke for you all: What s the anagram of Banach-Tarski? Banach-Tarski Banach-Tarsiki.) Warm-Up 2. We show that the unit disk B 1 (0) in R 2 is Jordan measurable by showing that the unit circle, which is the boundary of B 1 (0), has Jordan measure zero. Here we do this by finding 57

58 explicit grids which do the job, but later we will see a quicker approach, which generalizes to show that any continuous curve has Jordan measure zero. Given some ɛ > 0 we want to cover the unit circle by a finite number of small rectangles whose total summed area is smaller than ɛ. For each n 2, take the point (1, 0) together with the other 2 n 1 points which all together give the vertices of a convex regular 2 n -gon inscribed in the circle: Let G n be the grid on [0, 1] [0, 1] determined by all these points, meaning start with the small rectangles which have adjacent vertices as corners and then fill in the rest of the grid by translating these. Focus on the small rectangle which has its lower right corner at (1, 0): The central angle of the 2 n -gon is 2π, so this rectangle has height sin(π/2 n 1 ) and base 2 n 1 length 1 cos(π/2 n 1 ), and hence has area sin(π/2 n 1 )[1 cos(π/2 n 1 ]. All together there are 2 n rectangles in our grid which touch the unit circle (namely those with a corner at one of the vertices of the 2 n -gon we used), and by symmetry all have the same area. Thus for this grid we get: V ( B 1 (0); G n ) = R i = 2 n sin π ( π ) 2 n 1 1 cos 2 n 1. 2 n = π R i B 1 (0) It can be shown, say using L Hopital s rule, that this converges to 0 as n, implying that for any ɛ > 0 there exists n such that V ( B 1 (0); G n ) < ɛ, which shows that B 1 (0) has Jordan measure zero as desired. Again, after we ve spoken about various ways of interpreting integrability, we will come back and show that the unit circle has volume zero without having to explicitly construct any grids. 58

59 Upper and lower sums. Given a Jordan region E R n and a bounded function f : E R n, we construct the Riemann integral of f over E using upper and lower sums in similar manner as we did for single-variable functions. The only difference is that while in the single-variable case we only integrated over intervals, now we are integrating over a Jordan region which may not be a rectangular box even though the upper and lower sums we want should depend on grids defined over rectangular boxes. Thus, we need some way of extending f, which is only a priori defined over E, to a function defined on an entire rectangular box containing E. We proceed as follows. Pick a rectangular box R containing E and extend f to a function on all of R by defining it to be zero outside of E. The point of doing so is to ensure that the integral we end up with will only depend on how f behaves over E, since outside of E it is zero and zero functions contribute nothing to integrals. With this extended function in mind, given a grid G on R we define upper and lower sums just as we did in the single-variable case, only now summing over the small rectangular boxes arising from G which intersect E: U(f; G) = R i E ( sup x R i f(x) ) R i and L(f; G) = R i E ( inf f(x) x R i ) R i. Again to be clear, even if the given f : E R can actually already be viewed as a function defined on a larger domain, for the purposes of these definitions we nonetheless define the extension of f outside of E to be zero; for instance, if f was the constant function 1, which is clearly defined on all of R n, when constructing the integral of f over E we change the values of f outside of E to be zero instead of 1 to ensure that the integral only depends on E. Integrability. Analogous to the single-variable case, as grids get finer upper sums can only get smaller and lower sums can only get larger. Thus we define the upper and lower integrals of f over E as: (U) f(x) dx = inf{(u(f; G) G is a grid on R} and E (L) f(x) dx = sup{l(f; G) G is a grid on R}. E We say that f is (Riemann) integrable over E if the upper and lower integrals agree, and define this common value to be the integral of f over E, which we denote by either f(x) dx or f dv. E As expected from a multivariable calculus course, in the two-variable case this gives the signed volume of the region in R 3 between the graph of f and the Jordan region E in the xy-plane: the upper sums overestimate this volume while the lower sums underestimate it. Alternate characterization. As in the single-variable case, we have the following alternate characterization: a bounded function f : E R on a Jordan measurable region E R n inside of a rectangular box R is integrable if and only if for any ɛ > 0 there exists a grid G on R such that E U(f; G) L(f; G) < ɛ. The proof of this is exactly the same as in the single-variable case. Important. The definition of the Riemann integral in the multivariable setting works in the same way as in the single-variable case, only that before constructing upper and lower sums we define 59

60 the function in question to be zero outside of the region of integration to ensure that this outside region does not contribute to the integral. Example. Define f : [0, 1] [0, 1] R to be a function which has the value 1 at two points p and q of the unit square and the value 0 elsewhere: We claim that this is integrable with integral zero. The value of zero for the integral comes from the fact that the lower sums corresponding to any grid is always 0 since the infimum of f over any small rectangle will always be zero. Thus (using da instead of dv as is customary in the two-variable case): (L) f da = inf{0} = 0, [0,1] [0,1] and so if f is integrable this will necessarily be the value of [0,1] [0,1] f da. To show that f is integrable we must show that given any ɛ > 0 we can find a grid G such that U(f; G) L(f; G) = U(f; G) < ɛ. But in any grid, the only small rectangles R p and R q which will contribute something nonzero to the upper sum are those which contain the points p and q at which f has the value 1 since f is zero everywhere else; hence the upper sum will look like: ( ) ( ) U(f, G) = sup f(x) x R p R p + sup f(x) x R q R q = R p + R q. Thus we can use a similar idea we used in the single-variable case: make R p and R q small enough to balance out the supremum value of 1 in order to make the entire sum smaller than ɛ. Thus, for ɛ > 0, pick a rectangle R p [0, 1] [0, 1] around p whose area is less than ɛ 2 and a rectangle R q [0, 1] [0, 1] around q whose area is less than ɛ 2. If necessary, make these rectangles smaller to ensure that they do not intersect each other. Letting G be a grid which contains these two as subrectangles, we get: U(f; G) L(f; G) = R p + R q < ɛ 2 + ɛ 2 = ɛ, showing that f is integrable on [0, 1] [0, 1] as claimed. 60

61 May 4, 2015: More on Integrability Today we continued talking about integrability in higher-dimensions, elaborating on a key new aspect which doesn t have a direct single-variable analog. We also gave yet another characterization of integrability which makes precise the geometric intuition that integrals should give volumes of regions between graphs and domains of integration; this also works in the single-variable case, we just didn t have enough machinery available in the first quarter to state the claim. Warm-Up. Let E R n be a Jordan region. We show that the constant function 1 is integrable over E. On the one hand, this will be a consequence of the soon-to-be-mentioned fact that continuous functions on compact domains are always integrable, but we ll work it out here directly in order to demonstrated the subtleties involved. Based on what we know from a multivariable calculus course, we would expect that 1 dv = Vol E, E which is indeed true; we won t prove this in detail here but it s worked out in the book. We ll assume n = 2 for the sake of being able to draw nice pictures, but the general case is the same. Given a grid G on a rectangle R containing E, we want to work out what U(f; G) L(f; G) looks like. Recall that in order to define these upper and lower sums, we extend the given function to be zero outside of E; denote this extended function by f, so that f = 1 on E and f = 0 outside of E: Note that we can separate the subrectangles R i making up G into three categories: those which are fully contained in the interior of E, those which are fully contained in the interior of E c, and those which intersect E. Over the first type f always the value 1, so sup f inf f = 0 in this case. Hence such rectangles contribute nothing to U(f; G) L(f; G). Over the second type, f always has the value 0 so again sup f inf f = 0, meaning that these rectangles also do not contribute to upper sum minus lower sum. Thus the only nonzero contributions to upper sum minus lower sum can come from the rectangles which intersect the boundary of E: U(f; G) L(f; G) = (sup f inf f) R i. R i E 61

62 Now, any such R i contains something in E, so sup f = 1 on R i, and something not in E, so inf f = 0 on R i. Thus we get U(f; G) L(f; G) = R i = V ( E; G). R i E Since E is a Jordan region, E has volume zero, so for any ɛ > 0 there exists G such that this final expression is less than ɛ, and hence for this grid we have U(f; G) L(f; G) < ɛ, showing that 1 is integrable over E. The fact that the value of the integral is actually Vol E can be derived from the fact that L(f; G) V (E; G) U(f; G) for any grid G on R, so that the infimum of the terms in the middle (i.e. the volume of E) will be the common value of the supremum of the terms on the left and the infimum of the terms on the right. It would be good practice to understand why these final inequalities hold. In particular, the fact that L(f; G) V (E; G) depends on the fact that we extended the constant function 1 to be zero outside of E: if we had simply kept the value outside of E as 1 we would get that L(f; G) = U(f; G) and so the lower sums would no longer underestimate the value of the integral as we expect them to do. Integrability in terms of Jordan measurability. When f is integrable over E, given ɛ > 0 we know that we can find a grid G such that U(f; G) L(f; G) < ɛ. Note what this difference looks like in the single-variable case, which we mentioned in passing back in the first quarter. For f : [a, b] R integrable (actually continuous in the picture) we get: But, encasing the graph of f inside of a rectangle and viewing these small rectangles as the portions of a grid G which intersect the graph of f, we can now interpret this upper sum minus lower sum as an outer sum for the graph of f in R 2 : U(f, P ) L(f, P ) = V (graph f; G). Thus, the fact that we can make upper sum minus lower sum arbitrarily small says that we can find grids which make the outer sum for graph f arbitrarily small, which says that graph f has Jordan measure zero in R 2! The same idea works for the graph of any integrable function over a Jordan region in R n, so we get that: 62

63 If a bounded function f : E R is integrable over a Jordan region E R n, then the graph of f has Jordan measure zero in R n+1. This hints at a deep connection between integrability and Jordan measurability, and indeed we can now give the precise statement, which also applies in the single-variable case. For a nonnegative bounded function f : E R on a Jordan region E R n, define the undergraph of f to be the region in R n+1 lying between the graph of f and the region E in the hyperplane x n+1 = 0 where (x 1,..., x n+1 ) are coordinates on R n+1. (So, when n = 1 or 2, the undergraph is literally the region under the graph of f and above E.) Then we have: f is integrable over E if and only if the undergraph of f is Jordan measurable, in which case the integral of f over E equals the volume of the undergraph. Thus, we see that the basic intuition we ve used all along (going back to the first quarter) when defining integration is the correct one: to say that E f dv exists should mean the region under the graph of f has a well-defined volume, and the value of the integral should be equal to this volume. Hence, Riemann integrability and Jordan measurability are indeed closely linked to one another. The proof of this final equivalence is left as an optional problem on the homework, but the idea is as follows. The boundary of the undergraph should consist of three pieces: the graph of f on top, the Jordan region E on the bottom, and the sides obtained by sliding the boundary of E up until you hit the graph of f. Each of these pieces should have Jordan measure zero in order for the undergraph to be Jordan measurable, and the fact that each piece does so relates to something showing up in the definition of integrability: that the bottom E has Jordan measure zero in R n+1 is related to the fact that E is Jordan measurable in R n, that the sides have Jordan measure zero is related to the fact that E has Jordan measure zero in R n, and that the top (i.e. the graph of f) has Jordan measure zero relates to the fact that we can make upper sums minus lower sums arbitrarily small. Important. The graph of an integrable function has Jordan measure zero, and integrability is equivalent to the undergraph being Jordan measurable: the Jordan measure of the undergraph is equal to the integral of f. Properties analogous to single-variable case. Multivariable integrals have the same types of properties (with similar proofs) that single-variable integrals do. Here are a few key ones: continuous functions on compact domains are integrable (the proof, as in the single-variable case, uses the fact that continuous functions on compact domains are uniformly continuous), sums and scalar multiples of integrable functions are integrable, integrals of sums split up into sums of integrals, and constants can be pulled outside of integrals regions of integration can be split up when each piece is Jordan measurable, and the integrals split up accordingly integrals preserve inequalities: if f g and each is integrable, then f g. Check the book for full details, but again the proofs are very similar to the ones for the analogous single-variable claims. Projectable regions. We didn t talk about projectable regions in class, but it s worth mentioning. You can check the book for more details. A bounded subset E R n is projectable essentially if its 63

64 boundary can be described using graphs of continuous functions, or in other words if we can view E as the region enclosed by different graphs of continuous functions. (Check the book for the precise definition.) For instance, the unit disk in R 2 is projectable since the top portion of its boundary is the graph of y = 1 x 2 and the bottom portion of its boundary is the graph of y = 1 x 2. The basic fact is that projectable regions are always Jordan measurable. Indeed, since the boundary of such a region consists of graphs of continuous functions, it is enough to show that each such graph has Jordan measure zero. But continuous functions are always integrable, so the characterization of integrability in terms of Jordan measurability implies that such a graph indeed has Jordan measure zero, and we are done. This gives a much quicker proof of the fact that the unit circle has Jordan measure zero as opposed to the proof we gave last time in terms of explicit grids: we can find grids which make the outer sums for the top half and bottom half each arbitrarily small since these halves are each graphs of continuous functions, so putting these grids together gives grids which make the outer sum for the entire circle arbitrarily small. More generally, all standard types of geometric objects you re used to seeing in R 2 and R 3 (spheres, cones, ellipses, paraboloids, hyperboloids, etc.) are projectable regions and hence are Jordan measurable. To argue along these lines, note that if we are given some nice enough continuous (or better yet C 1 ) curve in R 2 or surface in R 3, something like the Implicit Function Theorem (!!!) will imply that we can view this object locally as being made up of graphs of continuous functions, and so the same reasoning as above will imply that this is Jordan measurable. Regions of volume zero don t matter. Although there are many similarities between multivariable and single-variable integrals, here is one which does not have a direct analog in the single-variable case, at least using the Riemann integral. (This DOES have a direct analog when using the single-variable Lebesgue integral, but this is not a topic we ll explore further.) The difference is that in the multivariable case we can integrate over more general types of regions than we can in the single-variable case. Here is the claim: if E R n has Jordan measure zero, then any bounded function f : E R is integrable and E f dv = 0. Thus, regions of volume zero can never contribute anything to values of integrals. The proof depends on Theorem in the book, which essentially says that for any Jordan region E, upper and lower sums (and hence upper and lower integrals) can be approximated to whatever degree of accuracy we want using only subrectangles in a grid which are contained fully in the interior of E. The idea is that in any grid we can separate the rectangles intersecting E into those contained in the interior and those which intersect the boundary, and the fact that boundary has volume zero can be used to make these contributions negligible, so that only the contributions from those rectangles in the interior actually matter. If E has empty interior, then this implies that the upper and lower sums themselves can be made arbitrarily small (since they can be approximated by a bunch of zeroes), so that E f dv = 0. In particular, it can be shown that if E has volume zero, then it must have empty interior and we have our result. Since regions of volume zero never contribute to integrals, it follows that if we take an integrable function and change its values only over a set of volume zero, the resulting function is still integrable and has the same integral as the original one. The idea is as follows. If f : E R is integrable and S E has volume zero, then we can split up the integral of f over E as: f dv = f dv + f dv, E S assuming that E \ S is also Jordan measurable, which it will be. The first integral on the right is zero, and so f dv = f dv, E 64 E\S E\S

65 meaning that the entire value of E f dv comes solely from the region outside S. If g : E R has the same values on E \ S as does f, then E\S f dv = E\S g dv and the claim follows. This is all clarified in detail in the book, and is an important fact to remember. We saw something analogous in the first quarter, where a homework problem showed that changing the value of an integrable function at a finite number of points does not affect the value of the integral, but now we have the most general statement available. Important. Integrating over a region of volume zero (or with empty interior) always gives the value zero, so that regions of volume zero never actually contribute to integrals. Thus, altering an integrable function on a set of volume zero does not affect the existence of nor value of the integral. May 6, 2015: Fubini s Theorem Today we spoke about Fubini s Theorem, which gives a way to compute multivariable integrals using iterated integrals. No doubt you did this plenty of times in a previous multivariable calculus courses, but here we delve into the reasons as to why this actually works since, at first glance, the definitions of multivariable integrals and iterated integrals are quite different. Warm-Up. Define f : [0, 1] [0, 1] R by f(x, y) = { 1 y y x 2 2x 2 y < x. We show that f is integrable. To be clear, the graph of f is a plane over the upper-left half of the square and looks like a downward curved parabolic surface over the bottom-right half: Over the upper-left half of the square, f(x, y) = 1 y is continuous and so is integrable. Over the bottom-right half, f(x, y) equals the continuous function g(x, y) = 2 2x 2 expect along the diagonal. Since the diagonal has Jordan measure zero, what happens along the diagonal does no affect integrability, so since f equals an integrable function expect on a set of volume zero, it too is integrable. Thus f is integrable over the upper-left half of the square and over the bottom-right half, so it is integrable over the entire square. To be clear, this is a property we mentioned last time and which is proved in the book: if f is integrable over a Jordan region A and also over a Jordan region B such that A B has Jordan measure zero, then f is integrable over A B. 65

66 Iterated integrals. For f : [a, b] [c, d] R, the iterated integrals of f are the expressions: d ( b ) b ( d ) f(x, y) dx dy and f(x, y) dy dx. c a (For simplicity, we will only give the definitions and results we ll look at in the two-variable case, even though the same applies to functions of more variables.) Computing these is what you no doubt spent plenty of time doing in a multivariable calculus course, where you first compute the inner integral and then the outer integral. We are interesting in knowing when these computations give the value of the multivariable integral f(x, y) d(x, y) [a,b] [c,d] we ve defined in terms of two-dimensional upper and lower sums. (We use two integrals symbols in the notation here simply to emphasize that we are integrating over a 2-dimensional region.) Here are some basic observations. First, in order for the iterated integrals to exist we need to know that that inner integrals exist. In the case of: b a a f(x, y) dx, this requires knowing that for any fixed y [c, d], the function x f(x, y) is integrable over the interval [a, b]; it is common to denote this function by f(, y), where y is fixed and the indicates the variable we are allowed to vary. Similarly, in order for the inner integral d c f(x, y) dy in the second iterated integral to exist we need to know that for any x [a, b], the singlevariable function f(x, ) is integrable over [c, d]. In addition, in order for the double integral [a,b] [c,d] f(x, y) d(x, y) to exist we need to know that f is integrable (in the two-variable sense) over [a, b] [c, d]. It turns out that when all these hypotheses are met all integrals in question are indeed equal, which is the content of Fubini s Theorem. Before giving a proof, we look at some examples which illustrate what can happen when some of the requirements are not met. Example: iterated integrals need not be equal. (This example is in the book; here we re just making the idea a little more explicit.) Define the function f : [0, 1] [0, 1] R according to the following picture: c 66

67 So, all nonzero values of f are positive or negative powers of 2: start with 2 2 in the upper-right quarter of the unit square, then 2 3 for the left-adjacent eighth ; then move down to the next smaller square along the diagonal and give f the value 2 4, then 2 5 for the next left-adjacent rectangle of half the size; then take the vale 2 6 for the next diagonal square, and 2 7 for the left-adjacent half, and so on, and everywhere else f has the value zero. We claim both iterated integrals of this function exist but are not equal. Consider 1 f(x, y) dx 0 for a fixed y [0, 1]. Along the horizontal line at this fixed y the function f has only two possible nonzero values: either a positive power of 2 or the negative next larger power of 2. (So in particular f(, y) is integrable for any y [0, 1].) For instance, in the picture: at the first fixed y value f only has the value 4 or 8. But the interval along which it has the value 8 has length 1 4 and the interval along which it has the value 4 has length 1 2, so at this fixed y we get: 1 ( ) ( ) 1 1 f(x, y) dx = = At the next fixed y in the picture, f only has the values 32 and 16, but the value 32 occurs along an interval of length 1 8 and the value 16 along an interval of length 1 4, so at this y we get 1 ( ) ( ) 1 1 f(x, y) dx = = as well. The same thing happens at any fixed y, so in the end 1 0 f(x, y) dx = 0 for any y [0, 1]. Thus: f(x, y) dx dy = 0 dy = Now we consider the other iterated integral. For the inner integral we now fix x [0, 1] and look at what happens to f along the vertical line at this fixed x: 0 67

68 For 0 x 1 2 we get the same behavior as before: f has two possible values a positive power of 2 and a negative power along intervals whose lengths make the total integral equal 0. For instance, at the first x in the picture, f has the value 16 along a vertical interval of length 1 4 and the value 8 along a vertical interval of length 1 2, so at this x we have: 1 ( ) ( ) 1 1 f(x, y) dy = 16 8 = Thus 1 0 f(x, y) dy = 0 for 0 x 1 2, so: f(x, y) dy dx = 0 1/2 1 0 f(x, y) dy 0 } {{ } 0 1 dx + 1/2 1 0 f(x, y) dy dx = 1 1/2 1 0 f(x, y) dy dx. For 1 2 x 1, f only has one nonzero value: 4 along a vertical interval of length 1 2. Thus 1 and hence 1 so f(x, y) dy = 4 1/2 1 0 ( ) 1 = 2 for x 1, f(x, y) dy dx = f(x, y) dy dx = 1 1 1/ dx = 1, f(x, y) dx dy = 0. Therefore the iterated integrals of this function both exist but are not equal. (Fubini s Theorem will not apply here because f is not integrable over the unit square in the two-variable sense.) Example: equality of iterated integrals does not imply integrability. (This example is also in the book.) Define f : [0, 1] [0, 1] R by { 1 (x, y) = ( p f(x, y) = 2, q ) n 2 Q 2 n 0 otherwise. So, f gives the value 1 on points whose coordinates are both rational with denominators equal to the same power of 2 and f gives the value 0 otherwise. For a fixed y [0, 1], there are only finitely 68

69 many x such that f(x, y) = 1: indeed, if y is not of the form q 2 there are no such x since f(x, y) = 0 n for all x in this case, whereas if y = q 2 only n x = 1 2 n, 2 2 n,..., 2n 2 n will satisfy f(x, y) = 1. Thus for any y [0, 1], f(, y) differs from the constant zero function only on a finite set of volume zero, so f(, y) is integrable on [0, 1] with integral 0. The same reasoning applies to show that for any x [0, 1], f(x, ) is integrable on [0, 1] with integral zero. Hence: f(x, y) dx dy = 0 } {{ } 0 and f(x, y) dy dx = 0, } {{ } 0 so the iterated integrals of f exist and are equal. However, we claim that f is not integrable over [0, 1] [0, 1]. Indeed, the set of points {( p E = 2 n, q ) } p, q Z and n 0 2 n is dense in R 2 as a consequence of the fact that the set of rationals with denominator a power of 2 is dense in R, and the complement of E is also dense in R 2 since it contains points with irrational coordinates. Thus given any grid R on [0, 1] [0, 1], any small rectangle will always contain a point of E and a point of E c, so sup f = 1 and inf f = 0 on any small rectangle. Thus U(f, G) = 1 and L(f, G) = 0 for any grid, so the upper integral of f is 1 and the lower integral is 0, showing that f is not integrable over the unit square. Thus, existence and equality of iterated integrals does not imply integrability of the function in question. Fubini s Theorem. The above examples show that iterated integrals do not always behave in expected ways, but if all integrals in question the double integral and both iterated integrals do exist, then they will all have the same value, which is the statement of Fubini s Theorem: Suppose that f : [a, b] [c, d] R is integrable, that for each y [c, d] the singlevariable function f(, y) is integrable on [a, b], and that for each x [a, b] the singlevariable function f(x, ) is integrable on [c, d]. Then d b b d f(x, y) d(x, y) = f(x, y) dx dy = f(x, y) dy dx. [a,b] [c,d] c a More generally, even if one of the iterated integrals does not exist, we will still have equality among the remaining iterated integral and the double integral. A similar statement holds in higher dimensions, and for other Jordan regions apart from rectangles as well. We give a different and easier to follow proof than the book s. The book s proof depends on Lemma 12.30, which is a nice but somewhat complicated result on its own; however, if all we are interested in is a proof of Fubini s Theorem, avoiding the use of this lemma gives a simper argument. The key point is that the double integral [a,b] [c,d] f(x, y) d(x, y) is defined in terms of two-dimensional upper and lower sums, whereas the iterated integrals in question do not explicitly involve these two-dimensional sums; yet nonetheless, we can come up with inequalities which relate the required upper and lower sums to the iterated integrals we want. a c 69

70 Proof. The partitions of [a, b] and [c, d] respectively: a = x 0 < x 1 < < x n = b and c = y 0 < y 1 < < y n = b and let G be the corresponding grid. Let m ij and M ij respectively denote the infimum and supremum of f over the small rectangle R ij = [x i 1, x i ] [y j 1, y j ], and set x i = x i x i 1 and y j = y j y j 1. For a fixed y [y j 1, y j ], we have: so m ij x i = xi m ij f(x, y) M ij for x [x i 1, x i ], x i 1 m ij dx xi x i 1 f(x, y) dx xi x i 1 M ij dx = M ij x i. Note that the middle integral exists since we are assuming that the single-variable function f(, y) is integrable for any y. Summing these terms up over all subintervals [x i 1, x i ] gives: m ij x i i b a f(x, y) dx i M ij x i where for the middle term we use the fact that the subintervals in question cover [a, b] so that: b x1 x2 xn a f(x, y) dx = f(x, y) dx + f(x, y) dx + + f(x, y) dx. x 0 x 1 x n 1 Taking integrals throughout with respect to y over the interval [y j 1, y j ] preserves the inequalities, giving: ( ) yj yj b ( ) yj m ij x i dy f(x, y) dx dy M ij x i dy, y j 1 i y j 1 a y j 1 i which is the same as m ij x i y j i yj b y j 1 a f(x, y) dx dy i M ij x i y j since y j = y j y j 1 dy. Summing up over all the subintervals [y j 1, y j ] gives: m ij x i y j i,j d b c a f(x, y) dx dy i,j M ij x i y j where we use d c blah = y1 y 0 blah + y2 y 1 blah + + yn y n 1 blah. The left hand side in the resulting inequalities is precisely the lower sum L(f, G) and the right hand side is the upper sum U(f, G), so we get that L(f, G) d b c a f(x, y) dx dy U(f, G) 70

71 for any grid G, saying that the value of the iterated integral is sandwiched between all lower and upper sums. Taking the supremums of the lower sums and infimums of the upper sums thus gives: sup{l(f, G)} d b c a f(x, y) dx dy inf{u(f, G)}. Since f is integrable over [a, b] [c, d], this supremum and infimum are equal, and thus must equal the iterated integral in the middle so [a,b] [c,d] f(x, y) d(x, y) = d b as claimed. A similar argument switching the roles of x and y shows that [a,b] [c,d] f(x, y) d(x, y) = c a b d a c f(x, y) dx dy f(x, y) dy dx as the rest of Fubini s Theorem claims. To be clear, we start with fixing x [x i 1, x i ] and use to get m ij y j = yj m ij f(x, y) M ij for y [y j 1, y j ], y j 1 m ij dy yj y j 1 f(x, y) dy yj Then we take sums over all subintervals [y j 1, y j ], which gives: y j 1 M ij dy = M ij y j. m ij y j j d c f(x, y) dy j M ij y j, and finally we take integrals with respect to x over [x i 1, x i ] and sum up over such intervals to get L(f, G) b d The same conclusions as before then hold. a c f(x, y) dy dx U(f, G). Remark. If the double integral and only one of the iterated integrals in question exists, the proof of Fubini s Theorem still gives equality between these two integrals. It is possible to find examples where R f(x, y) dx dy and one of the iterated integrals exists, but the other iterated integral does not. The book has such an example based on what I declared to be my favorite function of all time in the first quarter, check there for details. Important. If f is integrable over a Jordan region D R n, then its integral can be computed using any iterated integrals which exist as well. However, it is possible that these iterated integrals exist (with the same or different values) even if f is not integrable over D. 71

72 May 8, 2015: Change of Variables Today we spoke about the change of variable formula for multivariable integrals, which will allow us to write integrals given in terms of one set of coordinates in terms of another set. Together with Fubini s Theorem, this gives the most general method for computing explicit integrals, and we ll see later on that it also justifies us the formulas we use for so-called line and surface integrals taken over curves and surfaces respectively. Warm-Up. We justify Cavieleri s Principle: given two solids A and B of the same height, if for any z the two-dimensional regions A z and B z obtained by intersecting A and B respectively with the horizontal plane at height z have the same area, then A and B have the same volume. The picture to have in mind is the following, where on the left we have a cylinder and the right a cylinder with a bulge : The volumes in question are given by the integrals: Vol A = 1 dv and Vol B = A B 1 dv. Since the constant function 1 is continuous, Fubini s Theorem applies to say that each of these integrals can be computed using iterated integrals of the form: h ( ) h ( ) Vol A = 1 dx dy dz and Vol B = 1 dx dy dz 0 A z 0 B z where h is the common height of the two solids. To be clear about the notation, at a fixed z the inner integrals in terms of x and y are taken over the piece of the solid which occurs at that fixed height, which are precisely what A z and B z denote. But integrating 1 over these two-dimensional regions gives their area, so we can write the integrals above as: Vol A = h 0 (area of A z ) dz and Vol B = h 0 (area of B z ) dz. We are assuming that for any z, A z and B z have the same area, and thus the integrals above are the same, showing that Vol A = Vol B as claimed. Fun applications of Cavieleris Principle. For this class, the important part of the above Warm-Up was the use of Fubini s Theorem. But just for fun, here are some well-known uses of Cavieleri s Principle. First, take a cone of height and radius h sitting within a cylinder of height and radius h, and take the upper-half of a sphere of radius h: 72

73 The claim is that the volume of this half-sphere is equal to the volume of the region within the cylinder but outside the cone. Indeed, fix a height z and look at the intersections of these two solids with the horizontal plane at height z: For the solid consisting of the region inside the cylinder but outside the cone, this intersection is the two-dimensional region inside a circle of radius h and outside a circle of radius z, so it has area (area of larger disk) (area of smaller disk) = πh 2 πz 2 = π(h 2 z 2 ). For the solid within the top half of the sphere, this intersection is a disk of radius h 2 z 2 by the Pythagorean Theorem, so it has area π h 2 z 22 = π(h 2 z 2 ). Thus since these two intersections have the same area at any height z, Cavieleri s Principle implies that the two solids in question have the same volume. If we know that the volume of the cone in question is 1 3 πh3, then the volume of the region outside the cone is (volume of cylinder) (volume of cone) = πh πh3 = 2 3 πh3, which is thus the volume of the upper half of the sphere. Hence the volume of the entire sphere is twice this amount, which gives the well-known formula for the volume enclosed by a sphere of radius h, provided we know the formula for the volume of a cone. Or conversely, if we happen to know the formula for the volume of a sphere, this will give a way to derive the formula for the volume of a cone. Second, take spheres of different radii and cut out from each a cylindrical piece through the middle so that the remaining solids have the same height: 73

74 The claim is that these remaining solids have the same volume, so that this volume only depends on the height h used and not on the size of the sphere we started with. I ll leave the details to you, but the point is to show that the intersections of these left-over solids with any horizontal plane have the same area, so that Cavieleri s Principle applies again. This is known as the Napkin Ring Problem since the resulting solids which remain tend to look like the types of rings used to hold napkins in place in formal settings. Change of Variables. The change of variables formula tells us how to express an integral written in terms of one set of coordinates as an integral written in terms of another set of coordinates. This change of coordinates is encoded by a function φ : V R n defined on some open subset V of R n. We ll denote the coordinates in the domain by u V and the coordinates in the codomain by x R n, so that we are making a change of variables of the form u = φ(x). In order for things to work out, we assume that φ is one-to-one, C 1, and that Dφ is invertible. (We ll mention later where these assumptions come into play.) Here is the statement. Suppose that E V is a Jordan region and that f is integrable over φ(e). (For this integrability statement to make sense we have to know that φ(e) is also a Jordan region, which is actually a consequence of the conditions imposed on φ as we ll soon clarify.) Then f φ is integrable over E and: f(x) dx = f(φ(u)) det Dφ(u) du. Thus, visually: φ(e) E The Jacobian determinant term plays the role of an expansion factor which tells us how volumes on one side relate to volumes on the other side. 74

75 Some of the assumptions can be relaxed a bit: we only need φ to be one-to-one on E away from a set of Jordan measure zero and similarly we only need Dφ to be invertible on E away from a set of Jordan measure zero. This makes sense, since what happens on these sets of Jordan measure zero can t possibly affect the integrals in question. The requirement that φ be one-to-one guarantees that φ(e) is only traced out once, which we ll clarify in the following example. Example. Let us run through the standard conversion you would have seen in a multivariable calculus course from rectangular to polar coordinates in double integrals, in order to make sure that the various assumptions in the change of variables formula indeed hold in this case. The function φ : R 2 R 2 in this case is defined by φ(r, θ) = (r cos θ, r sin θ). This is C 1 and has Jacobian determinant given by: ( φ1 φ 1 ) ( ) cos θ r sin θ det Dφ(r, φ) = det r φ 2 r θ φ 2 θ = det sin θ r cos θ Thus this Jacobian matrix is invertible as long as r 0, so this fails to be invertible only at the origin which is okay since {(0, 0)} has Jordan measure zero. Take E = [0, 1] [0, 2π], which is a Jordan region. Then φ(e) = B 1 (0, 0), the closed unit disk of radius 1. Note that this is also a Jordan region, which as mentioned before is actually a consequence of the assumptions we have on φ. Note that φ is not one-to-one on all of E since φ(r, 0) = φ(r, 2π) for all r, but that it only fails to be one-to-one along the bottom and top edges of the rectangle E, which have Jordan measure zero; as mentioned before, this is good enough for the change of variables formula to be applicable. Thus for some integrable function f : φ(e) R, the changes of variables formula gives: f(x, y) d(x, y) = f(r cos θ, r sin θ) r d(r, θ), closed unit disk [0,1] [0,2π] just as you would expect. Suppose that instead we took E = [0, 1] [0, 4π], which still has image φ(e) equal to the closed unit disk. For the constant function f = 1, if the changes of variables formula we re applicable we would get: 4π 1 d(x, y) = r d(r, θ) = r dr dθ = 2π, closed unit disk [0,1] [0,4π] which is nonsense because we know that the left hand side should equal the area of the unit disk, which is π. (Note the use of Fubini s Theorem when computing the integral in polar coordinates as an iterated integral.) The problem is now that φ fails to be one-to-one throughout E since φ(r, θ) = φ(r, θ + 2π) for all r and 0 θ 2π, and so the region on which φ is not one-to-one no longer has Jordan measure zero. Thus the change of variables formula is not applicable in this case. Geometrically, the problem is that allowing θ to go all the way up to 4π gives two copies of the unit disk superimposed on one another, which means that the unit disk is traced out twice. Outline of proof of change of variables formula. The proof of the change of variables formula is quite involved, requiring multiple steps. The book divides these into various lemmas, and you can see that it takes multiple pages to work it all out. Here we only give an outline, emphasizing where the various assumptions we make come into play. Here are the basic steps required: 0 0 = r. 75

76 Step 0: show that if φ is C 1, one-to-one, and has invertible Jacobian matrix, then it sends Jordan regions to Jordan regions. I m calling this Step 0 because it is actually done in Section 12.1 in the book as Theorem 12.10, so way before the change of variables section. This guarantees that the regions of integration in both integrals showing up in the change of variables formula are indeed Jordan regions. As mentioned previously, the one-to-one and invertibility requirements can be relaxed a bit. Step 1: show that the change of variables formula holds in the special case where E is a rectangle and f is the constant function 1. This is Lemma in the book, which we ll give some geometric intuition for in a bit. Step 2: show that the change of variables formula holds for various pieces of the Jordan region E and an arbitrary integrable function f. This is Lemma in the book, and all steps so far are summarized as Lemma This is also the step where it is shown that f φ is automatically integrable as a consequence of our assumptions. The various pieces alluded to above come from the open sets on which φ is locally invertible as a consequence of the Inverse Function Theorem. Step 3: use compactness to show that what happens over the local pieces of E gives the require formula over all of E. This is Theorem in the book. From the previous steps you end up covering E with various open rectangles, and compactness of E allows us to work with only finitely many of these. As mentioned, the details are are quite involved, but since Jordan regions can in a sense be approximated by rectangles (an idea which is used in Step 2), the truly key part is Step 1. This is also the step which explains where the Jacobian determinant factor in the formula comes from, which has a linear algebraic origin. (Don t forget that locally, calculus is just linear algebra after all!) We finish with the geometric intuition behind Step 1. Geometric intuition behind change of variables. Suppose that E is a rectangle and that we are given some grid on it. Then φ transforms this into a possibly curved grid on φ(e): Since φ is differentiable, the Jacobian matrix Dφ provides a good approximation to the behavior of f, and Dφ roughly transforms a small rectangle in the grid on E into a small curved parallelogram in the grid on φ(e). According to the geometric interpretation of determinants from linear algebra, the volume of the this parallelogram is related to the area of the rectangle from which it came by: area of φ(r) = det Dφ(p) (area of R). 76

77 Thus given an outer sum i R i for E, we get a corresponding outer sum for φ(e) of the form: φ(r i ) = det Dφ(p i ) R i. i i Note that this is not quite an outer sum since we don t have an honest grid on φ(e), but rather a curved grid. Also note that since φ is C 1, so that Dφ is continuous, we have some control over how changing the grid on E will alter the grid on φ(e). Now, since φ is C 1 with invertible Jacobian, the Inverse Function Theorem implies that φ 1 is also C 1. The Jacobian matrix Dφ 1 transforms the curved grid on φ(e) into the original grid on E. Since Dφ 1 is continuous, given an honest grid on φ(e) which approximates the curved grid, the curved grid on E resulting from this honest grid after applying Dφ 1 will approximate the original grid on E fairly well. (A mouthful!) Using these ideas we can relate outer, lower, and upper sums on one side to those on the other side, and after some magic we get Step 1. Important. For a one-to-one, C 1 change of variables x = φ(u) with invertible Jacobian matrix, we have f(x) dx = f(φ(u)) det Dφ(u) du φ(e) E whenever everything involved in this formula makes sense, meaning that E should be a Jordan region and f should be integrable on φ(e). A key takeaway is that the Jacobian determinant det Dφ tells us how volumes are transformed under a change of variables. May 11, 2015: Curves Today we started working towards our final topic: the theorems of vector calculus. As a first step we defined and looked at various properties of curves, most of which should be familiar from a previous multivariable calculus course. A key takeaway is that most things we do with curves are done with respect to specific parametric equations, but in the end the results we develop are independent of the choice of parametric equations. Warm-Up 1. We compute the value of the integral cos(3x 2 + y 2 ) d(x, y) E where E is the elliptical region defined by x 2 + y 2 /3 1. Note that this integral exists since the integrand is continuous. We use the change of variables x = r cos θ, y = 3r sin θ, which we can describe using the function φ : [0, 1] [0, 2π] R 2 defined by φ(r, θ) = (r cos θ, 3r sin θ). This is C 1, one-to-one away from a set of volume zero, and has Jacobian determinant ( ) cos θ r sin θ det Dφ = det = 3r, 3 sin θ 3r cos θ which is nonzero away from a set of volume zero as well. Since E = φ([0, 1] [0, 2π]) and 3x 2 +y 2 = 3r 2, the change of variables formula applies to give: cos(3x 2 + y 2 ) d(x, y) = cos(3r 2 ) 3r d(r, θ). E 77 [0,1] [0,2π]

78 By Fubini s Theorem, we have: [0,1] [0,2π] cos(3r 2 ) 3r d(r, θ) = 2π 1 and thus E cos(3x2 + y 2 ) d(x, y) = π sin 3/ 3 as well π 3 3r cos(3r 2 ) dr dθ = 6 0 sin 3 dθ = π sin 3 3, Warm-Up 2. Suppose that f : R n R n is C 1, one-to-one, and has invertible Jacobian matrix at every point. We show that for each x 0 R n, Vol(f(B r (x 0 ))) lim = det Df(x 0 ). r 0 + Vol(B r (x 0 )) (This is almost the same as Exercise on the homework, only there we do not assume that f is one-to-one. I ll leave it to you to think about how to get around this subtlety.) First, we can rewrite the numerator of the fraction of which we are taking the limit as: Vol(f(B r (x 0 ))) = dx = det Df(u) du f(b r(x 0 )) B r(x 0 ) where we use x = f(u) as a change of variables, which we can do given the assumptions on f. Thus the fraction of which we are taking the limit is Vol(f(B r (x 0 ))) 1 = det Df(u) du. Vol(B r (x 0 )) Vol(B r (x 0 )) B r(x 0 ) Since f is C 1, the map u Df(u) is continuous, and hence so is u det Df(u) since the operation of taking the determinant of a matrix is a continuous one given that determinants can be expressed as polynomials in the entries of a matrix. Thus the integrand in the resulting integral is continuous at x 0, so Exercise from the previous homework gives as claimed. Vol(f(B r (x 0 ))) 1 lim = lim r 0 + Vol(B r (x 0 )) r 0 + Vol(B r (x 0 )) B r(x 0 ) det Df(u) du = det Df(x 0 ) Calculus is linear algebra, redux. The previous result should be viewed as another instance of the fact that, locally, multivariable calculus is just linear algebra. Indeed, the linear algebraic fact we are generalizing here is that for an invertible matrix A: Vol(A(Ω)) Vol(Ω) = det A for any Jordan region Ω with nonzero volume. The result above says that is true in the limit for a more general C 1 function with invertible Jacobian. Rewriting this equality as Vol(A(Ω)) = det A Vol(Ω) suggests that the entire change of variables formula itself should be viewed as the non-linear analog of this linear algebraic fact, which we alluded to when outlining the proof of the change of variables formula. 78

79 Curves. A curve in R n should be what we expect it to be, namely some sort of 1-dimensional object. To give a precise definition, we say that a C p curve in R n is the image C of a C p function φ : I R n where I is an interval and φ is one-to-one in the interior I o. To be clear, the curve C being described is the one in R n with parametric equations given by φ(t) = (x 1 (t),..., x n (t)), t I where (x 1,..., x n ) are the component of φ. We call φ : I R n a parametrization of C. For the most part, C 1 curves i.e. curves which can be described using continuously differentiable parametric equations are what we will be interested in. We say that a curve with parametrization φ : I R n is an arc is I is a closed interval [a, b] (so arcs have a start point and an end point), is closed if it is an arc and φ(a) = φ(b), and is simple if it does not intersect itself apart from possibly the common start and end point in the case of a closed curve. Simple, closed curves in R 2 are the ones which divide the plane into two pieces: a piece interior to the curve and a piece exterior to it. Look up the famous Jordan Curve Theorem to learn more about this. Smooth Curves. A curve C R n is said to be smooth at a point x 0 if there exists a C 1 parametrization φ : I R n of C such that φ (t 0 ) 0 where t 0 is the point in I which gives φ(t 0 ) = x 0. A curve is smooth if it is smooth at each of its points. As we will show in a bit, the point is that smooth curves are the ones which have well-defined tangent lines. To clarify one possible subtlety in the definition, to say that a curves is smooth at a point means that φ (t 0 ) 0 for some parametrization, but it is not true that every parametrization of a smooth curve has this property. For instance, the unit circle has parametric equations φ(t) = (cos t, sin t) for t [0, 2π], and since φ (t) = ( sin t, cos t) is never 0, the unit circle is smooth everywhere. However, we can also take ψ(t) = (cos t 3, sin t 3 ) for t [0, 3 2π] as parametric equations for the unit circle, but in this case ψ (t) = ( 3t 2 sin t 3, 3t 2 cos t 3 ) is 0 at t = 0, so this would be a non-smooth parametrization of the smooth unit circle. So, we can rephrase the definition of smoothness as saying that a smooth curve is one which has a smooth parametrization, even though it may have non-smooth parametrizations as well. Why we care about smooth curves. To justify the definition of smoothness given above, we show that smooth curves in R 2 have well-defined tangent lines. (Later we will also see that smooth curves are the ones which have well-defined directions.) We assume that the only types of curves we can precisely define tangent lines for are those which are graphs of single-variable functions (indeed, the whole point of the definition of differentiability for a single-variable function is to capture the idea that the graph should have a well-defined tangent line), so the point is to show that a smooth curve in R 2 can, at least locally, be described by such graphs. This will an application of the Implicit Function Theorem. So, suppose that C R 2 is a smooth C 1 curve with smooth C 1 parametrization φ : I R 2. Pick a point x 0 C and t 0 I such that φ(t 0 ) = x 0. Then we have parametric equations x = φ 1 (t) y = φ 2 (t) 79

80 where (φ 1, φ 2 ) are the components of φ. Since φ is a smooth parametrization, φ (t 0 ) = (φ 1(t 0 ), φ 2(t 0 )) (0, 0), so at least one component is nonzero say that φ 1 (t 0) 0. By the Implicit Function Theorem, we can then solve for y in terms of x locally near x 0 in the given parametric equations, or to work it out a little more explicitly, by the Inverse Function Theorem we can solve for t in terms of x in x = φ 1 (t) locally near x 0 to get t = φ 1 (x) where φ 1 1 is C 1, and then plugging into the second parametric equation gives y = (φ 2 φ 1 1 )(x) which expresses y as a C 1 function of x, at least near x 0. Thus near x 0, the curve C is the graph of the function f = φ 2 φ 1 1, so the curve has a well-defined tangent line at x 0 = (x 0, y 0 ) given by y = f(x 0 ) + f (x 0 )(x x 0 ). Important. A smooth curve is one which has a smooth parametrization. Geometrically, this is precisely the condition needed to guarantee that the curve has well-defined tangent lines and tangent vectors. Arclength. Now we can define the notion of the arclength of a smooth C 1 arc. Suppose that C R n is a smooth C 1 arc with smooth C 1 parametrization φ : [a, b] R n. The arclength of C is by the definition the value of b φ (t) dt a where denotes the usual Euclidean norm. (This is likely a definition you saw in a previous course.) The intuition is that φ (t) gives a vector tangent to the curve at a given point, so φ (t) gives the length of a little infinitesimal piece of C and this integral is then adding up these infinitesimal lengths to get the total length. Arclength is independent of parametrization. In order for this to be a good definition, we have to know that the number obtained solely depends on the curve C itself and not on the parametrization used. So, suppose that ψ : [c, d] R n is another smooth parametrization of C. We take it as a given that for any two such parametrizations ψ : [c, d] R n and φ : [a, b] R n can be related by a change of variables function τ : [c, d] [a, b] such that ψ = φ τ, which describes how to move from the parameter u in ψ to the parameter t = τ(u) in φ: 80

81 That such a τ exists is given as Remark 13.7 in the book, which you should check on your own. With this τ at hand, we have first using a change of variables: φ (t) dt = φ (t) dt = φ (τ(u)) τ (u) du. [a,b] τ([c,d]) Since ψ = φ τ, the chain rules gives ψ (u) = φ (τ(u))τ (u), so this final integral is φ (τ(u)) τ (u) du = ψ (u) du, so [c,d] b a φ (t) dt = d c [c,d] [c,d] ψ (u) du as required in order to say that φ and ψ give the same arclength. Line integrals. Given a smooth C 1 arc C in R n with parametrization φ : [a, b] R n and a continuous function f : C R, we define the line integral of f over C to be: C f ds = b a f(φ(t)) φ (t) dt. This integral essentially adds up the values of f along the curve C, and the fact that φ is C 1 and that f is continuous guarantees that this integral exists. In a previous course you might have seen this referred to as a scalar line integral, to distinguish it from so-called vector line integrals which arise when integrating vector fields along curves we ll look at this type of line integral later on. Even though the definition given depends on a parametrization of C, an argument similar to that for arclength shows that the line integral of f over C is independent of the parametrization used. In fact, we can also try to define the integral C f ds via an upper/lower sum approach which makes no reference to parametrizations at all; we ll outline how to do this later on and argue that this approach gives the same value for C f ds as to how we ve defined it here. Important. The line integral of a continuous function f : C R over a smooth C 1 arc C is independent of parametrization. In the case where f is the constant function 1, this line integral gives the arclength of C. May 13, 2015: Surfaces Today we spoke about surfaces, where as we did last time with curves, the point is to give precise definitions and justifications for concepts you would have seen in a previous multivariable calculus course. Note that everything we now do is an analog of something we did for curves. Another approach to arclength. Before moving on to surfaces, we give another reason as to why the definition we gave for arclength last time makes sense geometrically, by relating it to another possible definition. This is all covered in the optional material at the end of Section 13.1 in the book. The idea is that to define the length of a curve we can also argue using line segment approximations. Say we are given some curve, and choose a finite number of points along it: 81

82 The distances d(p i, p i 1 ) (where d denotes the Euclidean distance) between successive points give some sort of approximation to the length of the curve between those points, and the total sum: d(p i, p i 1 ) i then underestimates the actual length of the curve. Taking more and more points results in better and better approximations, so we can try to define the length as the supremum of such sums: { } arclength of C = sup d(p i, p i 1 ). Curves for which this quantity is finite are called rectifiable, and the claim is that for rectifiable smooth C 1 curves, this definition of arclength agrees with the previous one. The idea behind the proof is as follows. Choose a parametrization φ : [a, b] R n of C and a partition of [a, b]. Then the points φ(x 0 ), φ(x 1 ),..., φ(x n ) give successive points on the curve, and the distance between then are given by the expressions φ(x i ) φ(x i 1 ). Thus the sum over the entire partition is φ(x i ) φ(x i 1 ). i Now, using some version of the Mean Value Theorem, we can approximate these terms by so φ(x i ) φ(x i 1 ) φ (x i ) xi, φ(x i ) φ(x i 1 ) i i i φ (x i ) x i. But the right-hand side can now be viewed as a Riemann sum for the integral b a φ (t) dt, so taking supremums of both sides gives the result. This is only meant to be a rough idea, but you can check the book for full details. One final thing to note: this approach to defining arclength only depends on the Euclidean distance d, and so can in fact be generalized to arbitrary metric spaces. The result is a definition of arclength which makes sense in any metric space! 82

83 Warm-Up. Suppose that f : I R is a C 1 function on some interval I R such that f(θ) 2 + f (θ) 2 0 for all θ I. We show that the curve defined by the polar equation r = f(θ) is a smooth C 1 curve. The curve we are looking at is the one consisting of all points in R 2 whose polar coordinates satisfy r = f(θ). Thus our curve is given parametrically by: x = r cos θ = f(θ) cos θ y = r sin θ = f(θ) sin θ. Since f is C 1, the function φ : I R 2 given by φ(θ) = (f(θ) cos θ, f(θ) sin θ) is C 1 as well so we do have a C 1 curve. To check smoothness we compute: x (θ) = f (θ) cos θ f(θ) sin θ y (θ) = f (θ) sin θ + f(θ) cos θ. Then (x (θ), y (θ)) = (f (θ) cos θ f(θ) sin θ) 2 + (f (θ) sin θ + f(θ) cos θ) 2 = f (θ) 2 + f(θ) 2 0 after simplification. Thus φ (θ) = (x (θ), y (θ)) is nonzero everywhere, so C is smooth as claimed. Surfaces. Intuitively, a surface should be a 2-dimensional object in R 3. To make this precise, we say that a C p surface in R 3 is the image S of a C p function φ : E R 3 defined on a closed Jordan region E in R 2 which is one-to-one on E o. As with curves, we call φ : E R 3 a parametrization of S and its components (φ 1, φ 2, φ 3 ) give us parametric equations x = φ 1 (u, v), y = φ 2 (u, v), z = φ 3 (u, v) with (u, v) E for S. The fact that there are two parameters in these equations is what makes S two-dimensional. We note that, as with curves, a given surface can be expressed using different sets of parametric equations. In general, if φ : E R 3 and ψ : D R 3 are two parametrizations of a surface S, there is a change of variables function τ : D E such that ψ = φ τ, which tells us how to move from parameters (s, t) for ψ to parameters (u, v) = τ(s, t) for φ: The proof of this fact is similar to the proof of the corresponding fact for curves given in the book. 83

84 Example. Suppose that S is the portion of the cone z = x 2 + y 2 for 0 z h, where h is some fixed height. One set of parametric equations for S is φ(u, v) = (u, v, u 2 + v 2 ) with (u, v) B h (0, 0). Note however that this parametrization is not C 1 since the z-component is not differentiable at (u, v) = (0, 0), which corresponds to the point (0, 0, 0) on the cone. Instead, the parametrization given by ψ(r, θ) = (r cos θ, r sin θ, r) with (r, θ) [0, h] [0, 2π] is C 1. (In fact, this parametrization is C, showing that the cone is a C surface.) Smooth surfaces and normal vectors. Suppose that φ : E R 3 is a C p (C 1 will usually be enough) parametrization of a surface S. Denote the parameters by (u, v) E. Holding v fixed at a point and varying u results in a parametrization φ(, v) of a curve on S, and thus differentiating with respect to u gives a vector tangent to the surface, which we denote by φ u. Similarly, holding u fixed gives a parametrization φ(u, ) of another curve on S with tangent vector φ v obtained by differentiating with respect to v. We say that S is smooth at a point (x 0, y 0, z 0 ) = φ(u 0, v 0 ) if the cross product (φ u φ v )(u 0, v 0 ) is nonzero at that point, in which we case we call (φ u φ v )(u 0, v 0 ) a normal vector to S at φ(u 0, v 0 ). (Recall that this cross product is perpendicular to both tangent vectors φ u and φ v, which intuitively suggests that it should indeed be perpendicular to the surface S.) We say that S is smooth if it is smooth everywhere. This smoothness condition is what guarantees that well-defined tangent planes exist, as we will see in the Warm-Up next time. For now we mention that, as we saw with curves, a smooth surface may have non-smooth parametrizations all that matters is that a smooth parametrization exists. Concretely, if φ = (φ 1, φ 2, φ 3 ) are the components of φ, then: φ u φ v = i j k φ 1 u φ 1 v φ 2 u φ 2 v φ 3 u φ 3 v = ( φ2 u φ 3 v φ 2 φ 3 v u, φ 1 φ 3 u v + φ 1 φ 3 v u, φ 1 φ 2 u v φ 1 v ) φ 2. u But note that the first component here can be viewed as the Jacobian determiant of the matrix: D(φ 2, φ 3 ) := ( φ2 u φ 2 v and similarly the other components can also be viewed as Jacobian determinants. (Technically, the matrix above is the transpose of the Jacobian matrix D(φ 2, φ 3 ) of the function defined by the components (φ 2, φ 3 ), but since a matrix and its transpose have the determinant this will not affect our formulas.) That is, we have: φ 3 u φ 3 v ), φ u φ v = (det D(φ 2, φ 3 ), det D(φ 1, φ 3 ), det D(φ 1, φ 2 )) where D(φ i, φ j ) denotes the Jacobian matrix of the function defined by the components φ i and φ j. To save space, we will also denote this normal vector by N φ. 84

85 Important. A surface is smooth if it has nonzero normal vectors at every point. (Often times, being smooth except on a set of volume zero will be enough.) Geometrically, this condition guarantees the existence of a well-defined tangent plane. Back to cone example. Consider the C 1 parametrization ψ(r, θ) = (r cos θ, r sin θ, r) of the cone z = x 2 + y 2 we saw earlier. We have: so normal vectors are given by ψ r = (cos θ, sin θ, 1) and ψ θ = ( r sin θ, r cos θ, 0), ψ r ψ θ = ( r sin θ, r cos θ, r). This is nonzero as long as r 0, so we see that the cone is smooth everywhere except at (0, 0, 0), which is the tip of the cone. Note that it makes sense geometrically that the cone should not be smooth a this point, since the cone does not have a well-defined tangent plane at this point. How normal vectors change under a change of variables. Suppose that φ : E R 3 and ψ : D R 3 are two parametrizations of a smooth C 1 surface S, with parameters (u, v) E and (s, t) D. We can directly relate the normal vectors obtained by φ to those obtained by ψ as follows. Recall that given these parametrizations there exists a C 1 function τ : D E such that ψ = φ τ. Also recall the expression derived previously for normal vectors determined by ψ: Since ψ = φ τ, we also have by the chain rule. Thus so we get that ψ s ψ t = (det D(ψ 2, ψ 3 ), det D(ψ 1, ψ 3 ), det D(ψ 1, ψ 2 )). (ψ i, ψ j ) = (φ i, φ j ) τ, so det D(ψ i, ψ j ) = (det D(φ i, φ j ))(det Dτ) ψ s ψ t = (det Dτ) (det D(φ 2, φ 3 ), det D(φ 1, φ 3 ), det D(φ 1, φ 2 )), N ψ (s, t) = (det Dτ(s, t))n φ (τ(s, t)). Hence normal vectors determined by ψ are obtained by multiplying those which are determined by φ by det Dτ, so we should think of det Dτ as a type of expansion factor telling us how normal vectors are affected under a change of variables. This is analogous to the formula ψ (u) = τ (u)φ (τ(u)) relating tangent vectors determined by two parametrizations of a curve, where τ (a single-variable change of coordinates in this case) plays a similar expansion factor role. Surface area. Suppose that S is a smooth C 1 surface. Given a smooth C 1 parametrization φ : E R 3 with parameters (u, v) E, we define the surface area of S to be: N φ (u, v) d(u, v). E 85

86 Intuitively, φ u φ v gives the area of the little infinitesimal portion of S swept out by the tangent vectors φ u and φ v, so to obtain the total surface area we add up all of these infinitesimal areas. As with arclength, this definition is independent of parametrization, as we now show. Let ψ : D R 3 be another smooth C 1 parametrization with τ : D E satisfying ψ = φ τ, so that (u, v) = τ(s, t). Then: N φ (u, v) d(u, v) = N φ (u, v) d(u, v) E = = τ(d) D D N φ (τ(s, t)) det Dτ(s, t) d(s, t) N ψ (s, t) d(s, t) where in the second line we ve used a change of variables and in the final line the relation between N ψ and N φ derived above. Thus the surface area as computed using ψ is the same as that computed using φ, so the surface area is independent of parametrization. Surface integrals. Given a smooth C 1 surface S with parameterization φ : E R 3 and a continuous function f : S R 3, we define the surface integral of f over S to be: f ds := f(φ(u, v)) φ u φ v d(u, v), S E which we interpret as adding up all the values of f as we vary throughout S. Note that the book uses dσ instead of ds in the notation for surface integrals, but I like ds since it emphasizes better that we are integrating over a surface. Using an argument similar to the one for surface area, it can be shown that this definition is also independent of parametrization. As such, it makes sense to ask whether we can define surface integrals without using parametrizations at all. We ll come back to this next time. Important. Surface integrals arise when integrating functions over surfaces and are independent of the parametrization used. In the case where f is the constant function 1, the surface integral gives the surface area. May 15, 2015: Orientations Today we spoke about orientations of both curves and surfaces. The point is that an orientation will give us a to turn vector-valued functions (i.e. vector fields) into scalar-valued functions which are then suitable for integration. Although curves are always orientable, we ll see a well-known example of a surface which is not orientable. Warm-Up. Suppose that S is a C 1 surface which is smooth at (x 0, y 0, z 0 ) S. We show that S then has a well-defined tangent plane at (x 0, y 0, z 0 ). To be clear, we are assuming that the only type of surface for which tangent planes are well-defined are graphs of differentiable functions f : V R with V R 2 (indeed, you can take the definition of differentiable in this setting as what it means for the graph to have a well-defined tangent plane), so the claim is that locally near (x 0, y 0, z 0 ) we can express S as the graph of such a function. This calls for the Implicit/Inverse Function Theorem. Since S is smooth at (x 0, y 0, z 0 ) there exists a parametrization φ : E R 3 with φ(u 0, v 0 ) = (x 0, y 0, z 0 ) such that N φ (u 0, v 0 ) 0. Recall from last time the expression N φ (u 0, v 0 ) = (det D(φ 2, φ 3 )(u 0, v 0 ), det D(φ 1, φ 3 )(u 0, v 0 ), det D(φ 1, φ 2 )(u 0, v 0 )) 86

for the normal vector N φ, where φ = (φ 1, φ 2, φ 3 ) are the components of φ. Since this is nonzero, at least one component is nonzero we will assume that it is the third component which is nonzero.

87 for the normal vector N φ, where φ = (φ 1, φ 2, φ 3 ) are the components of φ. Since this is nonzero, at least one component is nonzero we will assume that it is the third component which is nonzero. (This will result in expressing z as a function of x and y, so if instead one of the other components were nonzero we would end up expressing that corresponding variable as a function of the other two.) Our parametric equations looks like x = φ 1 (u, v), y = φ 2 (u, v), z = φ 3 (u, v) (u, v) E, so since det D(φ 1, φ 2 )(u 0, v 0 ) 0 and (φ 1, φ 2 ) is C 1, the Inverse Function Theorem implies that we can locally express u and v as a C 1 function of x and y: near (x 0, y 0, z 0 ). This gives (u, v) = ψ(x, y) for some C 1 function ψ z = φ 3 (u, v) = (φ 3 ψ)(x, y) near (x 0, y 0, z 0 ), which expresses S near this point as the graph of the C 1 function f := φ 3 ψ. Thus S has a well-defined tangent plane at (x 0, y 0, z 0 ) which is explicitly given by where f is the C 1 function defined above. z = f(x 0, y 0 ) + f x (x 0, y 0 )(x x 0 ) + f y (x 0, y 0 )(y y 0 ) Line and surface integrals without parametrizations. Before moving on to orientations, we outline an attempt to define line and surface integrals without resorting to using parametric equations, mimicking the definitions we gave for integrals in terms of Riemann sums. Suppose we are given some smooth C 1 arc C R n and a continuous function f : C R n. To define the integral of f over C we can proceed by partitioning C by choosing successive points p 0, p 1,..., p n along C: taking the infimum or supremum of f along the part of C between successive partition points, and then forming lower and upper sums (inf f)d(p i, p i 1 ) and (sup f)d(p i, p i 1 ) i where d denotes the ordinary Euclidean distance. (Note that we are essentially mimicking the alternate approach to defining arclength we outlined previously for rectifiable curves.) We would then define C f ds as the common value of the supremum of the lower sums or the infimum of the I 87

88 upper sums. It turns out that this definition works perfectly well for rectifiable curves and gives the value we would expect. However, note now that, in this setting, we would expect a change of variables formula to hold just as it does for other types of integrals we ve seen, where a change of variables function φ : [a, b] R n : is precisely a parametrization of C! Thus, the change of variable formula in this setting would give f ds = f ds = f(φ(t)) φ (t) dt, C φ([a,b]) where φ (t) is the Jacobian expansion factor, which is precisely the definition we gave for C f ds using a parametrization. Here s the point: even if we defined line integrals using a Riemann sum approach, the corresponding change of variables formula would imply that this definition to the one in terms of a parametrization, so rather than go through the trouble of defining such Riemann sums we simply take the parametrization approach as our definition, since in the end it would give the correct answer anyway. The same is true for surface integrals. We can define surface integrals independently of parametrization by using certain upper and lower sums by picking grids on our surface, but in the end the change of variables formula will say that the resulting integral is equal to the one given in terms of a parametrization, so we simply take the latter approach as our definition of a surface integral, thereby avoiding having to develop some extra theory. This is morally why constructions involving curves and surfaces always come down to some computations in terms of parametrizations: even if we could perform these constructions without using parametric equations, in the end we would get the same types of objects using parametric equations anyway. Orientations of curves. Given a smooth C 1 curve C, an orientation on C is simply a choice of continuously-varying unit tangent vectors along C. (We need the smoothness and C 1 assumptions to guarantee that our curves have well-defined tangent vectors, coming from the fact that they have well-defined tangent lines.) Visually this just amounts to choosing a direction in which to follow C. Concretely, if φ : I R n is a parametrization of C, then the unit tangent vectors [a,b] φ (t) φ (t) give us one possible orientation of C and the negative of these gives us the other possible orientation. To be clear, the C 1 assumption says that the assignment t φ (t) of a tangent vector to each point on the curve is a continuous one, which is what we mean by saying that the tangent vectors 88

89 vary continuously along the curve. Note that because of this, for a connected curve we cannot have a scenario such as: since some version of the Intermediate Value Theorem would imply that in this setting there must be some p = φ(t 0 ) C at which φ (t 0 ) = 0, which is ruled out by the smoothness assumption on C. This is what guarantees that we are indeed choosing a single direction of movement along C when we are picking an orientation. When do parametrizations give the same orientation? We can easily determine when two parametrizations φ, ψ of a curve C give the same orientation. Recall that we have ψ = φ τ for some function t = τ(u). Then ψ (u) = φ (τ(u))τ (u), which implies that the unit vectors obtained from ψ and φ are the same precisely when τ (u) > 0. Thus two parametrizations give the same orientation when the change of parameters function τ has positive derivative everywhere. For instance, consider the unit circle C in R 2. This has possible parametric equations and also φ(t) = (cos t, sin t) for 2π t 4π, ψ(u) = (cos u 3, sin u 3 ) for 3 2π u 3 4π. In this case, τ(u) = u 3 is the function satisfying ψ = φ τ, and since τ (u) = 3u 2 is positive for all 3 2π u 3 4π, we have that φ and ψ determine the same orientation on C, which makes sense since both sets of parametric equations give the counterclockwise direction on the unit circle. Orientations of surfaces. Given a smooth C 1 surface S, an orientation on S is a choice of continuously-varying unit normal vectors across S. (Again, the smoothness and C 1 assumptions guarantee that S has well-defined tangent planes, so the the notion of a normal vector makes sense through S.) For a parametrization φ : E R 3 of S, the possible unit normal vectors are given by φ u φ v φ v φ u or φ u φ v φ u φ v, which are negatives of one another. The C 1 assumption says that the map (u, v) φ u φ v is continuous, which is what we mean by continuously-varying normal vectors across S. The smoothness assumption guarantees that for a connected surface we can t have one possible orientation along a portion of our surface but then opposite orientation along another portion: for this to be true the Intermediate Value Theorem would imply that at some point the normal vector would have to be zero, which is not allowed by 89

90 smoothness. However, there is a new subtlety with surfaces which we didn t see for curves: not all surfaces have well-defined orientations. We ll come back to this in a it. Recalling that the normal vectors of two parametrizations φ and ψ are related by N ψ = (det Dτ)N φ where τ is the change of parameters function satisfying ψ = φ τ, we see that two parametrizations determine the same orientation when det Dτ > 0 throughout S. Example. Parametric equations for the unit sphere are given by: This gives the normal vectors X(φ, θ) = (sin φ cos θ, sin φ sin θ, cos φ) for (φ, θ) [0, π] [0, 2π]. X φ X θ = sin φ(sin φ cos θ, sin φ sin θ, cos φ) = (sin φ)x(φ, θ). Since sin φ > 0 for φ (0, π), this gives normal vectors which point in the same direction as the vector X(φ, θ) extending from the origin to a point on the sphere, so this gives the outward orientation on the sphere. The negatives of these normal vectors give the inward orientation. Note that the possibility that certain normal vectors point outward and others inward is ruled out by the Intermediate Value Theorem and the fact that X is a C 1 parametrization, as mentioned earlier. Non-orientable surfaces. Consider the surface S with C 1 parametrization φ(u, v) = (( 1 + v sin u ) ( cos u, 1 + v sin u ) sin u, v cos u ) for (u, v) [0, 2π] [ 1 2, 1 ]. 2 A lengthy computation shows that φ u φ v = sin u ( ( cos u + 2v cos 3 u 2 2 cos u )) i + 1 ( 4 cos u u ) cos3 2 + v(1 + cos u cos2 u) j cos u ( 1 + v cos u ) k, 2 2 and from this it is possible (although tedious) to show that S is smooth everywhere. Now, note from the normal vector derived above that: N φ (0, 0) = (0, 0, 1) and N φ (2π, 0) = (0, 0, 1). However, going back to the parametric equations, we see that φ(0, 0) = (1, 0, 0) = φ(2π, 0), so the values (u, v) = (0, 0) and (u, v) = (2π, 0) of our parameters both determine the same point on S. This is bad: using (u, v) = (0, 0) gave a normal vector of (0, 0, 1) at this point while using (u, v) = (2π, 0) gave a normal vector of (0, 0, 1), which points in the opposite direction. The conclusion is that the given parametrization does not give a unique well-defined normal vector at (1, 0, 0). We will show as a Warm-Up next time that no choice of parametrization of this surface S gives a well-defined normal vector at (1, 0, 0), so the problem here is really a fault of the surface and not any specific choice of parametrization. The lack of being able to define a unique normal vector at each point of S says that it is not possible to give S an orientation, so we say that S is non-orientable. The problem is that we 90

91 cannot make a consistent choice of continuously-varying normal vectors across S. We will avoid such surfaces in this class since the types of integrals we will soon consider depend on having an orientation. This surface in particular is known as the Möbius strip, and is what you get if you take a strip of paper, twist one end, and then glue the ends together there s a picture in the book. Two other famous non-orientable surfaces are the Klein bottle and the real projective plane, although these really aren t surfaces according to our definition since they cannot be embedded in R 3 rather, they would be examples of non-orientable surfaces in R 4. Important. An orientation of a curve is a continuous choice of unit tangent vectors along the curve, and an orientation of a surface is a continuous choice of unit normal vectors across the surface. All curves have orientations, but not all surfaces do. May 18, 2015: Vector Line/Surface Integrals Today we started talking about vector line/surface integrals, which the book calls oriented line/surface integrals. These are the integrals which arise when wanting to integrate a vector field over a curve or surface, and are the types of integrals which the Big Theorems of Vector Calculus deal with. We reviewed some properties and even computed a couple of explicit examples. Warm-Up. We justify a claim we made last time about the Möbius strip S, that the lack of having a well-defined normal vector at every point throughout the surface really is a characteristic of the surface itself and not a fault of the specific parametrization we previously gave. To be precise, the claim is that if ψ : D R 3 is any parametrization of S, then ψ does not assign to the point (1, 0, 0) S a well-defined unique normal vector. Let φ : D R 3 be the parametrization we gave for S last time. The key thing to recall is that for these equations we have N φ (0, 0) = k but N φ (2π, 0) = k even though φ(0, 0) and φ(2π, 0) both give the same point (1, 0, 0) on S. If τ : D E is the change of parameters function satisfying ψ = φ τ, then N ψ (s, t) = (det Dτ(s, t))n φ (τ(s, t)). In particular, take (s 0, t 0 ) to be one set of values which give (1, 0, 0) and (s 1, t 1 ) to be another, so that τ(s 0, t 0 ) = (0, 0) and τ(s 1, t 1 ) = (2π, 0). Then and N ψ (s 0, t 0 ) = (det Dτ(s 0, t 0 ))N φ (0, 0) = (det Dτ(s 0, t 0 ))k N ψ (s 1, t 1 ) = (det Dτ(s 1, t 1 ))N φ (2π, 0) = (det Dτ(s 1, t 1 ))k. But det Dτ is either always positive or always negative since otherwise the Intermediate Value Theorem would imply that it was zero somewhere, contradicting the smoothness of these parametrizations. Thus in either case, the two vectors above will never point in the same direction since one is going to be a positive multiple of k and the other a negative multiple of k. Since (s 0, t 0 ) and (s 1, t 1 ) both give the point (1, 0, 0) on S, this shows that ψ does not assign a unique normal vector to this point as claimed. Vector line integrals. Suppose that C is a smooth C 1 oriented arc in R n and that F : C R n is a continuous function. (Such an F is said to be a continuous vector field along C, which we 91

92 normally visualize as a field of little vectors varying along C.) We define the vector line integral (or oriented line integral) of F over C as: F T ds C where T denotes the unit tangent vector field along C determined by the given orientation. To be clear, the orientation determines T, which is then used to turn the R n -valued function F into an R-valued function F T, which is then integrated over the curve using a scalar line integral. This procedure is why we care about orientations! Given a parametrization φ : [a, b] R n of C, we get: which gives the familiar formula C F T ds = C b a F T ds = F(φ(t)) b a φ (t) φ φ (t) dt, (t) F(φ(t)) φ (t) dt you would have seen in a multivariable calculus course. (Of course, the value is actually independent of the parametrization.) Geometrically, this integral measures the extent to which you move with or against the flow of F as you move along the curve. To be concrete, the dot product F T is positive when F and T point in the same general direction i.e. when the angle between them is less than 90 and is negative when F and T point in opposite directions i.e. when the angle between them is greater than 90 and the line integral then adds up all of these individual dot product contributions. Example 1. Just in case it s been a while since you ve looked at these types of integrals, we ll do an explicit computation. Let C be the piece of the parabola y = x 2 oriented from (π/2, π 2 /4) to (5π/4, 25π 2 /16) and let F : C R 2 be ( F(x, y) = y sin x x 2, cos x 2x We use φ : [π/2, 5π/4] R 2 defined by φ(t) = (t, t 2 ) as a parametrization of C. Then: C (F T) ds = = = 5π/4 π/2 5π/4 π/2 5π/4 π/2 F(φ(t)) φ (t) dt ( = 2 1. ). t2 sin t t 2, cos t ) (1, 2t) dt 2t ( sin t + cos t) dt Vector surface integrals. Suppose S is a smooth C 1 oriented surface in R 3 and that F : S R 3 is a continuous vector field on S. We define the vector surface integral (or oriented surface integral) of F over S as F n ds S 92

93 where n denotes the unit normal vector field across S determined by the given orientation. Analogously to the line integral case, the point is that the orientation determines n, which is then used to turn the vector-valued function F into a scalar-valued function F n, which is then integrated over S using a scalar surface integral. Given a parametrization φ : E R 3 of S such that φ u φ v gives the correct direction for normal vectors, we get S F n ds = which gives the familiar formula F n ds = S E F(φ(u, v)) (φ u φ v ) φ u φ v φ u φ v d(u, v), E F(φ(u, v)) (φ u φ v ) d(u, v) you would have seen in a multivariable calculus course. As with all of these types of integrals, the value is independent of parametrization. Geometrically, this integral measures the extent to which F flows across S either with or against the orientation. Indeed, the dot product F n is positive when F points in the same general direction as n and is negative when it points opposite the direction of n, and the surface integral then adds up these individual quantities. Example 2. We compute the vector surface integral of F = ( 4, 0, x) over the portion S of the plane x + z = 5 which is enclosed by the cylinder x 2 + y 2 = 9, oriented with upward-pointing normal vectors. Parametric equations for S are given by: This gives φ(r, θ) = (r cos θ, r sin θ, 5 r cos θ) for (r, θ) [0, 3] [0, 2π]. φ r φ θ = (cos θ, sin θ, cos θ) ( r sin θ, r cos θ, r sin θ) = (r, 0, r), which gives the correct orientation since these normal vectors point upwards due to the positive third-component. Thus F n ds = F(φ(r, θ)) (φ r φ θ ) d(r, θ) S = = [0,3] [0,2π] 2π π 3 0 = 36π. 0 ( 4, 0, r cos θ) (r, 0, r) dr dθ ( 4r r 2 cos θ) dr dθ Important. Vector line and surface integrals are both defined by using orientations to turn vectorvalued vector fields into scalar-valued functions (via taking dot products with tangent or normal vectors), and then integrating the resulting functions. In terms of parametric equations, we get the formulas we would have seen previously in a multivariable calculus course. Fundamental Theorem of Line Integrals. We say that a continuous vector field F : E R n defined on some region E R n is conservative on E if there exists a C 1 function f : E R such that F = f. We call such a function f a potential function for F. 93

94 The value obtained when integrating a conservative field over a curve is given by the following analog of the Fundamental Theorem of Calculus for Line Integrals: if f : E R is C 1 and C E is a smooth oriented C 1 arc in E, then f T ds = f(end point of C) f(initial point of C). C Indeed, suppose that φ : [a, b] R n is a parametrization of C. then C f T ds = b a f(φ(t)) φ (t) dt. By the chain rule, the integrand in this integral is the derivative of the single-variable function obtained as the composition f φ: (f φ) (t) = f(φ(t)) φ (t), So by the single-variable Fundamental Theorem of Calculus we have: which gives b a f(φ(t)) φ (t) dt = C b a (f φ) (t) dt = f(φ(b)) f(φ(a)), f T ds = f(end point of C) f(initial point of C) as claimed. In particular, two immediate consequences are the following facts: If C 1, C 2 are two smooth C 1 arcs with the same initial point and the same end point, then C 1 f T ds = C 2 f T ds. This property says that line integrals of conservative vector fields are path-independent in the sense that the value does only depends on the endpoints of path but not on the specific path chosen to go between those points. If C is a smooth closed curve, then C f T ds = 0. We will see next time that a field which has either of these properties must in fact be conservative. Important. Line integrals of conservative fields can be calculated by evaluating a potential function at the end and start points of a curve, and subtracting. This implies that line integrals of conservative fields are path-independent and that the line integral of a conservative field over a closed curve is always zero. May 22, 2015: Green s Theorem Today we stated and proved Green s Theorem, which relates vector line integrals to double integrals. You no doubt saw applications of this in a previous course, and for us the point is understand why it is true and, in particular, the role which the single-variable Fundamental Theorem of Calculus plays in its proof. Warm-Up. Suppose that F : E R n is a continuous vector field on a region E R n. We show that line integrals of F are path-independent in E if and only if F has the property that its line 94

95 integral over any closed oriented curve in E is zero. Last time we saw that conservative fields have both of these properties, so now we show that these two properties are equivalent to one another for any field. (We ll see in a bit that either of these properties in fact implies being conservative.) Suppose that line integrals of F in E are path-independent and let C be a closed oriented curve in E. Pick two points A and B on C and denote by C 1 the portion of C which goes from A to B along the given orientation, and C 2 the portion which goes from B back to A along the given orientation: We use the notation C 1 + C 2 for the curve obtained by following C 1 and then C 2, so C 1 + C 2 = C. Now, denote by C 2 the curve C 2 traversed with the opposite orientation, so going from A to B. Then C 1 and C 2 both start at A and end at B, so by the path-independence assumption we have F T ds = F T ds. C 1 C 2 Changing the orientation of a curve changes the sign of the line integral, so F T ds = C 2 F T ds, and hence C 2 F T ds + C 1 F T ds = 0. C 2 Thus C F T ds = F T ds = C 1 +C 2 F T ds + C 1 F ds = 0, C 2 showing that the line integral of F over a closed curve is zero as claimed. Conversely suppose that the line integral of F over any closed curve in E is zero. Let C 1 and C 2 be two oriented curves in E which start at the same point A and end at the same point B. Then the curve C 1 + ( C 2 ) obtained by following C 1 from A to B and then C 2 in the reverse direction from B back to A is closed, so our assumption gives F T ds = 0. But C 1 +( C 2 ) C 1 +( C 2 ) F T ds = F T ds + C 1 F T ds = C 2 F T ds C 1 F T ds, C 2 so C 1 F T ds = C 2 F T ds, showing that line integrals of F in E are path-independent. 95

96 Path-independence implies conservative. We now show that conservative fields are the only ones with the properties given in the Warm-Up, at least for fields on R 2. To be precise, we show that if F : R 2 R 2 is a continuous vector field such that its line integrals in R 2 are path-independent, then F must be conservative. The same holds for R n instead of R 2 using the same proof, only with more components. This result also holds for domains other than R n, although the proof we give below has to be modified. (We ll point out where the modification comes in.) Denote the components of F by F = (P, Q), which are each continuous. Fix (x 0, y 0 ) R 2 and define the function f : R 2 R by f(x, y) = F T ds C(x,y) where C(x, y) is any smooth C 1 path from (x 0, y 0 ) to (x, y). The path-independence property of F assures that f is well-defined in that the specific curve C used to connect (x 0, y 0 ) to (x, y) is irrelevant. The motivation for this definition comes from the Fundamental Theorem of Calculus: if g : [a, b] R is continuous, then the function G(x) = x a g(t) dt obtained by integrating g is differentiable with derivative g. We claim that f is differentiable (in fact C 1 ) and has gradient f = F. First we compute the partial derivative of f with respect to x. For this we choose as a path connecting (x 0, y 0 ) to (x, y) the one consisting of the vertical line segment L 1 from (x 0, y 0 ) to (x 0, y), and then the horizontal line segment L 2 from (x 0, y) to (x, y). We have f(x, y) = F T ds + F T ds. L 1 L 2 Parametrizing L 1 with φ 1 (t) = (x 0, t), y 0 t y gives y (P, Q) T ds = (P (x 0, t), Q(x 0, t)) (0, 1) dt = L 1 y 0 and parametrizing L 2 with φ 2 (t) = (t, y), x 0 t x gives x (P, Q) T ds = (P (t, y), Q(t, y)) (1, 0) dt = L 2 x 0 so f(x, y) = Differentiating with respect to x gives y y 0 Q(x 0, t) dt + x x 0 P (t, y) dt. f x (x, y) = 0 + P (x, y) = P (x, y), y x y 0 Q(x 0, t) dt x 0 P (t, y) dt, where the first term is zero since the first integral is independent of x and the second term is P (x, y) by the Fundamental Theorem of Calculus. (Note the assumption that P is continuous is used here to guarantee that the derivative of the second integral is indeed P (x, y).) Now, to compute f y (x, y) we choose a path consisting of the line segment C 1 from (x 0, y 0 ) to (x, y 0 ) and the line segment C 2 from (x, y 0 ) to (x, y). Then after parametrizing these segments we can derive that x y f(x, y) = F T ds + F T ds = P (t, y 0 ) dt + Q(x, t) dt. C 1 C 2 x 0 y 0 96

97 Differentiating with respect to y gives 0 for the first integral and Q(x, y) for the second, so f y (x, y) = Q(x, y), and hence f = (f x, f y ) = (P, Q) = F, showing that F is indeed conservative on R 2. Important. For a vector field F : D R n which is C 1 on an open, connected region D R n, the properties that F is conservative on D, that F has path-independent line integrals in D, and that the line integral of F over any closed curve in D is zero are all equivalent to one another. Remarks. As mentioned before, the same is true for vector fields on R n with the path-independence property: we just repeat the same argument as above when differentiating with respect to other variables, using appropriately chosen paths consisting of various line segments. Also, the argument we gave works for other domains, as long as the line segments used are always guaranteed to be in those domains. This is true, for instance, for convex domains. More generally, a similar argument works for even more general domains open connected ones in particular but modifications are required since the line segments used may no longer be entirely contained in such domains; the way around this is to take paths consisting of multiple short horizontal and vertical segments, where we move from (x 0, y 0 ) a bit to the right, then a bit up, then a bit to the right, then a bit up, and so on until we reach (x, y). It can be shown that there is a way to do this while always remaining within the given domain. (Such a path is called a polygonal path, and it was a problem earlier in the book which was never assigned to show that such paths always exist in open connected regions.) One last thing to note. Let us go back to the expression derived in the proof above when wanting to compute f x (x, y): f(x, y) = y y 0 Q(x 0, t) dt + x x 0 P (t, y) dt. When computing f y (x, y) we opted to use another path, but there is no reason why we couldn t try to differentiate this same expression with respect to y instead. The first term is differentiable with respect to y by the Fundamental Theorem of Calculus, and to guarantee that the second term is differentiable with respect to y we assume that P is C 1. Then the technique of differentiation under the integral sign (given back at the beginning of Chapter 11 in the book) says that the operations of differentiation and integration can be exchanged: y x x 0 P (t, y) dt = x x 0 P (t, y) dt, y so we get x f y (x, y) = Q(x 0, y) + P y (t, y) dt. x 0 Since we know f = F, it must be that this expression equals Q(x, y): Q(x, y) = Q(x 0, y) + x x 0 P y (t, y) dt. This is indeed true(!), and is a reflection of the fact that P y = Q x for a conservative C 1 vector field: P y = f xy = f yx = Q x by Clairaut s Theorem, where f being C 2 is equivalent to F = f being C 1. Similarly, differentiating the expression f(x, y) = x x 0 P (t, y 0 ) dt + 97 y y 0 Q(x, t) dt

98 obtained when wanting to compute f y (x, y) with respect to x gives: P (x, y) = P (x, y 0 ) + y which is also a true equality and reflects P y = Q x as well. y 0 Q x (x, t) dt, Green s Theorem. Green s Theorem relates line integrals to double integrals, and is a special case of Stokes Theorem, which we ll talk about next time. The statement is as follows. Suppose that D R 2 is a two-dimensional region whose boundary D is a piecewise, smooth, simple C 1 curve with positive orientation, and that F = (P, Q) : D R 2 is a C 1 vector field on D. Then (P, Q) T ds = (Q x P y ) da. D The positive orientation on D is the one where, if you were to walk along D in that direction, the region D would be on your left side. (This a bit of a hand-wavy definition, but is good enough for most purposes. Giving a more precise definition of the positive orientation on the boundary would require having a more precise definition of orientation, which is better left to a full-blown course in differential geometry.) The proof of Green s Theorem is in the book. There are two key ideas: first, prove Green s Theorem in the special case where D is particular nice, namely the case where pieces of its boundary can be described as graphs of single-variable functions, and second glue such nice regions together to get a more general D. The second step involves using the Implicit Function Theorem to say that smoothness of D implies you can indeed describe portions of D as singlevariable graphs. The first step boils down to the observation that the types of expressions P (x, g 1 (x)) P (x, g 2 (x)) and Q(f 2 (y), y) Q(f 1 (y), y) you end up with (after using parametric equations) can be written due the continuity of P and Q as integrals: P (x, g 1 (x)) P (x, g 2 (x)) = g2 (x) g 1 (x) D P y (x, t) dt and Q(f 2 (y), y) Q(f 1 (y), y) = f2 (x) f 1 (x) Q x (t, y) dt as a consequence of the Fundamental Theorem of Calculus. This introduction of an additional integral is what turns the line integral on one side of Green s Theorem into the double integral on the other side, and is where the Q x P y integrand comes from. Check the book for full details. I point this out now since the same idea is involved in the proof of Gauss s Theorem, suggesting that Green s Theorem and Gauss s Theorem are the same type of result, which is an idea we ll come back to on the final day of class. Important. Green s Theorem converts line integrals into double integrals, and its proof boils down to an application of the single-variable Fundamental Theorem of Calculus. May 27, 2015: Stokes Theorem Today we spoke about Stokes Theorem, the next Big Theorem of Vector Calculus. Stokes Theorem relates line integrals to surface integrals, and is a generalization of Green s Theorem to curved surfaces; its proof (at least the one we ll outline) uses the two-dimensional Green s Theorem at a key step. 98

99 Warm-Up 1. Suppose that F = (P, Q) : U R 2 is a C 1 vector field on an open set U R 2. We show that for any p U: 1 (Q x P y )(p) = lim r 0 + πr 2 F T ds B r(p) where B r (p) is oriented counterclockwise for any r > 0. Note that U being open guarantees that for small enough r > 0, the circle B r (p) is contained in U, so that the line integral in question makes sense since F is defined along points of B r (p). The justifies the geometric interpretation of the quantity Q x P y you might have seen in a previous course: it measures the circulation of F around any given point, where counterclockwise circulations are counted as positive and clockwise circulations are negative. Indeed, the line integral F T ds B r(p) measures the circulation of F around the circle of radius r around p, and so the limit where we take this circle getting smaller and smaller and closing in on p is interpreted as the circulation around p itself. This is a special case of the geometric interpretation of the curl of a vector field, which we ll come to after stating Stokes Theorem. By Green s Theorem, we have 1 πr 2 B r(p) F T ds = 1 (Q x (x, y) P y (x, y)) d(x, y). Vol(B r (p)) B r(p) Since F is C 1, Q x P y is continuous so a problem from a previous homework shows that 1 lim (Q x (x, y) P y (x, y)) d(x, y) = (Q x P y )(p), r 0 + Vol(B r (p)) B r(p) which then gives the desired equality 1 (Q x P y )(p) = lim r 0 + Vol(B r (p)) B r(p) (Q x (x, y) P y (x, y)) d(x, y) = lim Warm-Up 2. Let F = (P, Q) be the vector field ( F(x, y) = y ) x 2 + y 2, x x 2 + y 2. πr 2 r B r(p) F T ds. We show that F is conservative on the set R 2 \ {nonnegative x-axis} obtained by removing the origin and the positive x-axis from R 2. Let C be any simple smooth closed curve in R 2 \ {nonnegative x-axis} and let D be the region it encloses, so that C = D. Then F is C 1 on D and so Green s Theorem gives F T ds = ± (Q x P y ) da C where the ± sign depends on the orientation of C. For this field we have D Q x = y2 x 2 x 2 + y 2 = P y, 99

100 so Q x P y = 0 and hence D (Q x P y ) da = 0. Thus the line integral of F over any simple smooth closed curve in R 2 \ {nonnegative x-axis} is zero, and so by an equivalence we established last time, this implies that F is conservative on R 2 \ {nonnegative x-axis}. Observation. Now, knowing that the field above F is conservative on R 2 \ {nonnegative x-axis}, we can try to find a potential function for F over this region, which is a C 2 function f : R 2 \ {nonnegative x-axis} R satisfying f = F. This is not something you d be expected to able to do on the final, and I m just mentioning this in order to illustrate an important property of this specific vector field. To start with, a direct computation involving derivatives of the arctangent function will show that: ( tan 1 y ) = F. x However, f(x, y) = tan 1 ( y x) is not defined for x = 0, so this does not give us a potential function over all of R 2 \{nonnegative x-axis} yet. For now, we view this as defining our sought-after potential only in the first quadrant. The idea is now to extend this function to the second, third, and fourth quadrants in a way so that we have f = F at each step along the way. Another direction computation shows that the following is also true: ( tan 1 x ) = F. y This is good since this new potential function is defined for x = 0, although it is no longer defined for y = 0. The point is that we will only try to use this expression to define the sought-after potential over the second quadrant and on the positive y-axis, on which the potential we used above over the first quadrant was not defined. However, we have to do this in a way which guarantees the potential we re defining (by piecing together local potentials) over the first and second quadrants is in fact C 1 even along the positive y-axis. For (x, y) in the first quadrant, y x > 0 so as x 0 we have y x +, and thus tan 1 ( y x ) π 2. The function tan 1 ( x y ) has value 0 when x = 0, so the function ( ) x tan 1 + π y 2 has value π 2 on the positive y-axis. Thus the function defined by ( tan 1 y ) x on the first quadrant and tan 1 ( x y ) + π 2 on the second quadrant will indeed be C 1 even on the positive y-axis. Since adding a constant to a function does not alter its gradient, this piece over the second quadrant still has gradient equal to F, so now we have a C 1 function on the upper-half plane which has gradient F. We continue in this way, using the two arctangent expressions above to define the sought-after potential over the remaining quadrants, where we add the necessary constants to ensure that we still get C 1 functions along way as we jump from one quadrant to another: 100

101 For (x, y) in the second quadrant, x y < 0 so as y 0 (i.e. as we approach the x-axis), we have ( ) x y and thus x tan 1 y + π 2 ( π 2 ) + π 2 = π. Thus defining the potential function over the third quadrant and negative x-axis to be ( tan 1 y ) + π x maintains the C 1 condition and gradient equals F condition. For (x, y) in the third quadrant, y x > 0 to as x 0 we get tan 1 ( y x ) + π 3π 2, so we define our potential to be ( ) x tan 1 + 3π y 2 over the fourth quadrant and on the negative y-axis. The conclusion is that the function f : R 2 \ {nonnegative x-axis} R defined by: tan 1 ( y ) x( ) x > 0, y > 0 tan 1 x y + π 2 x 0, y > 0 f(x, y) = tan 1 ( y x) ( + ) π x < 0, y 0 tan 1 x y + 3π 2 x 0, y < 0 is C 1 and satisfies f = F, so f is a potential function for the conservative field F over the region R 2 \ {nonnegative x-axis}. Now, note what happens if we take the same expression above, only we now consider it as a function defined on R 2 \ {(0, 0}, so including the positive x-axis. This seems plausible since the portion tan 1 ( y x) defining f over the first quadrant is in fact defined when y = 0, so we could change the first case in the definition above to hold for x > 0, y 0, thereby giving a well-defined function on R 2 \ {(0, 0)}. However, if we take the limit of the portion of f defined in the third quadrant as (x, y) approaches the positive x-axis from below we would get a value of 2π, which means that in order to preserve continuous the portion defined over the first quadrant would have to be ( tan 1 y ) + 2π x instead of simply tan 1 ( y x ). Since we re off by an additional 2π, this says that there is no way to extend the definition of f above to the positive x-axis so that it remains continuous, let alone C 1, 101

$( which suggests that F = ) y x, x 2 + 2 x 2 +y 2 should actually NOT be conservative over the punctured plane R 2 \ {(0, 0)}.$

102 ( which suggests that F = ) y x, x x 2 +y 2 should actually NOT be conservative over the punctured plane R 2 \ {(0, 0)}. This is in fact true, as we can see from the fact that F T ds = 2π unit circle, counterclockwise The fact that F is conservative over R 2 \ {nonnegative x-axis} but not R 2 \ {(0, 0)} is crucial in many applications of vector calculus, and in particular is related to the fact in complex analysis that there is no version of the complex log function which is differentiable on the set of all nonzero complex numbers; if you want a differentiable log function in complex analysis, you must delete half of an axis or more generally a ray emanating from the origin in C. For this given field F, the line integral over any closed simple curve can only have one of three values: ( y ) C x 2 + y 2, x 2π if C encircles the origin counterclockwise x 2 + y 2 T ds = 2π if C encircles the origin clockwise 0 if C does not encircle the origin. Manifold boundary. Before talking about Stokes Theorem, we must clarify a new use of the word boundary we will see. Given a surface S, we define its manifold boundary as follows. Given a point in S, if we zoom in on S near this point we will see one of two things: either S near this point will look like an entire disk, or S near this point will look like a half-disk: The points near which S looks like a half-disk are called the manifold boundary points of S, and the manifold boundary S of S is the curve consisting of the manifold boundary points. Intuitively, the manifold boundary is the curve describing where there is an opening into S. A closed surface is one which has no manifold boundary; for instance, a sphere is a closed surface. Notationally, we denote manifold boundaries using the same symbol as we had for topological boundaries, and which type of boundary we mean should be clear from context. To distinguish this from our previous notion of boundary, we might refer to the previous notion as the topological boundary of a set. In general, the manifold and topological boundaries of a surface are quite different; they agree only for a surface fully contained in the xy-plane viewed as a subset of R 2. Similar notions of manifold boundary can be given for objects other than surfaces: the manifold boundary of a curve will be the set of its endpoints, and the manifold boundary of a three-dimensional solid in R 3 is actually the same as its topological boundary. An orientation on S induces a corresponding orientation on S, which we call the positive orientation on S. The simplest way to describe this orientation is likely what you saw in a previous course: if you stand on S with your head pointing in the direction of the normal vector 102

103 determined by the orientation on S, the positive orientation on S corresponds to the direction you have to walk in in order to have S be on your left side. It is possible to give a more precise definition of this positive orientation, but this would require, as usual, a more precise definition of orientation in terms of linear algebra and differential geometry. Stokes Theorem. Suppose that S is a piecewise smooth oriented C 2 surface whose boundary S is a piecewise smooth C 1 curve, oriented positively. Stokes Theorem says that if F = (P, Q, R) : S R 3 is a C 1 vector field, then F T ds = curl F n ds where S curl F = (R y Q z, P z R x, Q x P y ) is the curl of F, which is a continuous vector field curl F : S R 3. Thus, Stokes Theorem relates line integrals on one side to surface integrals of curls on the other. We note that Green s Theorem is the special case where S is a surface fully contained in the xy-plane. Indeed, here the normal vector n is k = (0, 0, 1) and S curl F n = Q x P y, so the surface integral in Stokes Theorem becomes simply S (Q x P y ) da as in Green s Theorem. One may ask: if Green s Theorem is a consequence of Stokes Theorem, why prove Green s Theorem first instead of simply proving Stokes Theorem and deriving Green s Theorem from it? The answer, as we ll see, is that the proof of Stokes Theorem uses Green s Theorem in a crucial way. Geometric meaning of curl. Suppose that F : R 3 R 3 is a C 1 vector field, so that curl F is defined, and let n be a unit vector in R 3. Fix p R 3 and let D r (p) be the disk of radius r centered at p in the plane which is orthogonal to n at p. Then we have F T ds = curl F n ds and D r(p) D r(p) 1 lim r 0 + πr 2 curl F n ds = curl F(p) n D r(p) since the function q curl F(q) n is continuous. Thus we get 1 curl F(p) n = lim r 0 + πr 2 F T ds, D r(p) giving us the interpretation the curl F(p) measures the circulation of F around the point p; in particular, curl F(p) n measures the amount of this circulation which occurs on the plane orthogonal to n. This is the geometric interpretation of curl you might have seen in a previous calculus course, and the point is that it follows from Stokes Theorem and the homework problem which tells us how to compute the limit above. Proof of Stokes Theorem. The proof of Stokes Theorem is in the book, and the basic strategy is as follows. As in the proof of Green s Theorem, we look at a special case first and then piece 103

104 together these special surfaces to build up the general statement; the special case in this case is that of a surface S given by the graph z = f(x, y) of a C 2 function. Let E be the portion of the xy-plane lying below S, and let (x(t), y(t)), t [a, b] be parametric equations for E. Then φ(t) = (x(t), y(t), f(x(t), y(t))), t [a, b] are parametric equations for S. Using this we can express the left-hand side of Stokes Theorem as b F T ds = F(φ(t)) φ (t) dt. S a After computing φ (t) and writing out this dot product, the resulting expression can be written as another dot product of the form: (some two-dimensional field) (x (t), y (t)), which is the type expression you get in two-dimensional lines integrals. Indeed, this will write the line integral over S as a line integral over E instead: F T ds = (some two-dimensional field) T ds. S E To this we can now apply Green s theorem, which will express this line integral over E as a double integral over E: (some two-dimensional field) T ds = (some function) d(x, y). E Finally, the integrand in this double integral can be written as a dot product which looks like: (R y Q z, P z R x, Q x P y ) (normal vector to S), so that the double integral describes what you get when you use the parametrization ψ(x, y) = (x, y, f(x, y)) (x, y) E to compute the surface integral of curl F over S: (some function) d(x, y) = E S E curl F n ds. To recap, the process is: write the line integral of F over S as a line integral over E instead, apply Green s Theorem to get a double integral over E, rewrite this double integral as the surface integral of curl F over S. Check the book for full details. The tricky thing is getting all the computations correct. For instance, in the first step you have to compute φ (t) for φ(t) = (x(t), y(t), f(x(t), y(t))), where differentiating the third component z(t) = f(x(t), y(t)) here requires a chain rule: z (t) = f x x (t) + f y y (t). Thus (P, Q, R) (x (t), y (t), f x x (t) + f y y (t)) = (P + f x, Q + f y ) (x (t), y (t)), 104

105 so that the some two-dimensional field referred to in the outline above is the field (P + RRf x, Q + f y ). This is the field to which Green s Theorem is applied, in which we need to know: x (Q + Rf y) y (P + Rf x). Each of these derivatives again require chain rules since Q and R are dependent on x in two ways: Q = Q(x, y, f(x, y)) R = R(x, y, f(x, y)) and P and R are dependent on y in two ways: P = P (x, y, f(x, y)) R = R(x, y, f(x, y)). We have: x Q(x, y, f(x, y)) = Q x + Q z f x and y P (x, y, f(x, y)) = P y + P z f y, and similar expressions for the derivatives of R. This all together gives the some function referred to in the outline above. Note that in this step there will be some simplifications in the resulting expression using the fact that f yx = f xy since f is a C 2 function. We ll stop here and leave everything else to the book, but notice how the curl of F naturally pops out of this derivation, in particular in the some function in the outline. Indeed, it was through this derivation that the curl was first discovered: historically, it s not as if someone randomly wrote down the expression (R y Q z, P z R x, Q x P y ) and then wondered what interesting properties this might have, but rather this expression showed up in the proof of Stokes Theorem and only after that was it identified as an object worth of independent study. The name curl historically came from the geometric interpretation in terms of circulation we gave earlier. Important. Stokes Theorem relates the surface integral of the curl of a field over a surface to the line integral of the uncurled field over the (manifold) boundary of that surface. The power of Stokes Theorem in practice comes from relating one-dimensional objects (line integrals) to twodimensional objects (surface integrals), and in giving geometric meaning to curl F. May 29, 2015: Gauss s Theorem Today we spoke about Gauss s Theorem, the last of the Big Theorems of Vector Calculus. Gauss s Theorem relates surface integrals and triple integrals, and should be viewed in the same vein as the Fundamental Theorem of Calculus, Green s Theorem, and Stokes Theorem as a theorem which tells us how an integral over the boundary of an object can be expressed as an integral over the entire object itself. Warm-Up 1. We verify Stokes Theorem (i.e. compute both integrals in Stokes Theorem to see that they are equal) for the vector field F(x, y, z) = (2y z, x + y 2 z, 4y 3x) and the portion S of the sphere x 2 + y 2 + z 2 = 4 where z 0, with outward orientation. 105

106 First we compute We have S curl F n ds. curl F = (5, 2, 1) n ds. At a point (x, y, z) on S, the vector going from the origin to (x, y, z) itself is normal to S, so n = 1 2 (x, y, z) is a unit normal vector to the S at (x, y, z). Thus 1 curl F n ds = (5x + 2y z) ds. 2 S The integral of 5x + 2y over S is zero due to symmetry (the integrand is odd with respect to a variable the surface is symmetric with respect to), so curl F n ds = 1 z ds. 2 Parametrizing S using we have S ψ(φ, θ) = (2 sin φ cos θ, 2 sin φ sin θ, 2 cos φ), (φ, θ) [ π, π] [0, 2π], 2 ψ φ ψ θ = (2 cos φ cos θ, 2 cos φ sin θ, 2 sin φ) ( 2 sin φ sin θ, 2 sin φ cos θ, 0) so ψ φ ψ θ = 4 sin φ. Hence 1 z ds = = (4 sin 2 φ cos θ, 4 sin 2 φ sin θ, 4 sin φ cos φ), S 2π π 0 S π/2(2 cos φ)(4 sin φ) dφ dθ = 1 2 S 2π 0 4 dθ = 4π is the value of S curl F n ds. Now we compute S F T ds. The (manifold) boundary of S is the circle of radius 2 in the xy-plane with clockwise orientation. We parametrize S using we have S F T ds = 2π 0 φ(t) = (2 cos t, 2 sin t, 0), 0 t 2π, F(φ(t)) φ (t) dt 106

107 = = 2π 0 2π 0 = 4π ( 4 sin t, 2 cos t + 4 sin 2 t, 8 sin t 6 cos t) ( 2 sin t, 2 cos t, 0) dt (8 sin 2 t 4 cos 2 t 8 sin 2 cos t) dt after using sin 2 t = 1 2 (1 cos 2t) and cos2 t = 1 2 (1 + cos 2t). Thus S F T ds = S curl F n ds as claimed by Stokes Theorem. Warm-Up 2. Suppose that G is a continuous vector field on R 3. We say that surface integrals of G are surface-independent if S 1 G n ds = S 2 G n ds for any oriented smooth surfaces S 1 and S 2 with the same boundary and which induce the same orientation on their common boundary. We show that surface integrals of G are surface-independent if and only if the surface integral of G over any closed surface is zero. Before doing so, here s why we care. First, this is the surface integral analog of the fact for line integrals we showed in a previous Warm-Up that line integrals of a field G are path-independent if and only if the line integral of G over any closed curve is zero. Second, fields of the form G = curl F where F is a C 1 field always have these properties as a consequence of Stokes Theorem. Indeed, in the setup above, Stokes Theorem gives curl F n ds = F T ds = curl F n, ds, S 1 S 1 = S 2 S 2 so surface integrals of curl F are surface-independent. Also, if S is closed, then S = so curl F n ds = F T ds = 0 S since the latter takes place over a region of volume zero, empty in fact. This is one of the many reasons why curls play a similar role in surface integral theory that conservative fields do in line integral theory. (We ll see a deeper reason why this analogy holds next time when we talk about differential forms.) Suppose that surface integrals of G are surface-independent and let S be a closed oriented surface. Slice through S to create two surfaces S 1 and S 2 with the same boundary, which is the curve where the slicing occurred: S 107

Math 291-2: Lecture Notes Northwestern University, Winter 2016

Math 291-2: Lecture Notes Northwestern University, Winter 2016 Written by Santiago Cañez These are lecture notes for Math 291-2, the second quarter of MENU: Intensive Linear Algebra and Multivariable Calculus,